SAN FRANCISCO – Facebook said Thursday that it had repaired a technical failure that led to long-term operational failures on its various features, including Instagram, WhatsApp and Messenger.
The interruption lasts almost 24 hours on some of the services and was the longest in Facebook's latest story. It was an eye-opening reminder that even the most powerful internet companies, using the best computer science and technology, can still be degraded by human error.
"All the major online companies have several lines of defense, but sometimes an encoding error made by an engineer can recover on thousands of computers and cause big mistakes," said Alex Stamos, a former chief security officer on Facebook and a lecturer at Stanford University. In other words, restarting something as complex as Facebook is very, very hard. "
The little mistake had major consequences . Instagram users could not see other profiles, WhatsApp users could not send messages, and news streams across Facebook's main app ran out. [1 9659002] Downdetector, similar to itself in a weather forecast for the internet, said it had received 7.5 million problem reports on Facebook apps. By comparison, widespread problems on YouTube in October challenged only 2.7 million reports. Downdetector partly measures service interruptions by counting reports from users who have problems.
"Never before have we seen such a big outbreak," said Tom Sanders, co-founder of the Downdetector.
Early Thursday, Facebook was able to pull most of their systems back online. The company is still trying to figure out how that error reverberated throughout the network. Facebook officials stressed that the problem had not been caused by hacking or a cyber-assault as a so-called "denial-of-service" attack, which would turn servers with a wave of traffic that led them to stop working.
For many years, Facebook has recruited engineers on the idea that within a few weeks they can release the data code that affects billions of people.
"I still get a great deal of fulfillment from seeing that my work has a meaningful impact on so many people's lives," a testimony from An employee says on Facebook's "career" recruitment site.
But it also means that an individual's error can have widespread consequences, especially since Facebook is working on a recently detailed plan to consolidate the infrastructure of its "family of apps". "The more tightly a computer network becomes, the more likely a small technical problem can grow into a large one.
Facebook, like other internet giants, relies on Never disconnected. That predictability has helped it to become one of The Most Influential – and Criticized – Companies in the World An estimated two billion plus people use one or more services each day.
As people become more addicted to Facebook's services, to chat with family and friends as well as work , they have higher expectations of performance, Mr. Sanders said.
"The downtime tolerance is diminishing, and people are constantly expecting services to work flawlessly 365 days a year," he said.
Although the incident was an irritation to many users, had more pressing consequences for businesses, such as advertising, which rely on Facebook's network to generate revenue.
Kieley Taylor, global social manager at the advertising agency GroupM, said that the company h ennes had not been able to access Facebook's system, which entails new advertising campaigns were delayed.
"It's never a good day for a mistake," she said. "Fortunately, it was relatively short, but it was completely out."
Her company continued to try to determine how many ad campaigns it had hit. Ms. Taylor said that because Facebook's advertising system worked on a pay-as-you-go basis, GroupM would not need to seek Facebook refunds for non-delivered ad campaigns.
GroupM derived advertising to Google search, YouTube and other websites, but said Facebook had a unique reach given its size.
"Because of all the people on the platform, it continues to be a truly powerful digital marketing platform," said Ms Taylor.