Amazon’s huge cloudburst caused by “unexpected behavior”

Amazon said automated processes in its cloud business caused power outages across the Internet this week, affecting everything from Disney theme parks and Netflix videos to robotic vacuum cleaners and Adele ticket sales.

In a statement on Friday, Amazon said that the problem started on Tuesday when an automated computer program – designed to make the network more reliable – ended up causing a “large number” of systems to behave strangely. This in turn created a wave of activity on Amazon’s network, which ultimately prevented users from accessing any of their cloud services.

“Basically, a bad piece of code was executed automatically and it caused a snowball effect,”[ads1]; said Forrester analyst Brent Ellis. The power outage persisted “because their internal controls and monitoring system were disconnected by the traffic storm caused by the original problem.”

Amazon explained the error in a highly technical statement posted online. The problems started around 07:30 Pacific time on Tuesday and lasted for several hours before Amazon managed to fix the problem. Meanwhile, social media lit up with complaints from consumers who were angry that their smart home gadgets and other internet-connected services had suddenly stopped working.

Some experts said that the explanation does not fully help users understand what went wrong.

“They do not explain what this unexpected behavior was, and they did not know what it was. So they guessed when they tried to fix it, and that’s why it took so long,” said Corey Quinn, cloud economist at Duckbill Group.

AWS is generally a reliable service. Amazon’s cloud department last suffered a major incident in 2017, when an employee accidentally shut down more servers than intended during repairs to a billing system. Still, the recent power outage reminded the world how many products and services are centralized in regular data centers run by just a handful of major technology companies, including Amazon, Microsoft and Alphabet’s Google.

There is no easy solution to the problem. Some analysts believe that companies should duplicate their services across multiple cloud computing providers so that no crashes put them out of service. Others say a “multi-cloud” strategy would be impractical and could make companies even more vulnerable because they would be exposed to everyone’s power outages, not just AWS.

“We know that this incident affected many customers in significant ways,” the company said in the jargon-filled statement. “We will do everything we can to learn from this incident and use it to further improve our accessibility.”

Source link

Back to top button

mahjong slot