On Oct. 4, 2021, Facebook, Instagram, Whatsapp, and Messenger all went down for around six hours, starting before noon Eastern Daylight Time until just before 6 p.m. This disrupted the work of small business owners, nonprofits, and organizations providing lifesaving services, healthcare workers, and even politicians around the world. The company itself faced millions of dollars of losses in ad revenue.
For a while no one knew the cause of the massive outage, and Facebook released an apology:
The outage was so bad that even its own employees couldn’t access company emails and internal communications platforms, and were locked out of their offices because their digital badges stopped working.
In a statement, Facebook revealed a cyber attack was not the cause. It said:
Our engineering teams have learned that configuration changes on the backbone routers that coordinate network traffic between our data centers caused issues that interrupted this communication. This disruption to network traffic had a cascading effect on the way our data centers communicate, bringing our services to a halt.
Our services are now back online and we’re actively working to fully return them to regular operations. We want to make clear at this time we believe the root cause of this outage was a faulty configuration change. We also have no evidence that user data was compromised as a result of this downtime.
It appears that the outage was due to a configuration change in Facebook’s systems that resulted in Facebook’s machines being unable to communicate with each other.
Externally, some argued it could have been an issue with its domain name system (DNS) and Border Gateway Protocol (BGP). Bloomberg explained that the problem began with Facebook’s DNS, which is “like a phone book for the internet.” The news organization described it as “the tool that converts a web domain, like Facebook.com, into the actual internet protocol, or IP, address where the site resides.” So when a DNS error occurs, users can no longer browse the site. In this case, Bloomberg reported the problem had to do with Facebook’s BGP which is like the “postal service” for the internet, determining the best paths for data to travel.
Bloomberg reported that according to Cloudfare, a web security company, public records show that Facebook was making big changes to its BGP routes right before the outages occurred. Facebook hasn’t commented on the reasons for those changes.
Cloudfare wrote soon after Facebook released its statement: “Externally, we saw the BGP and DNS problems outlined in this post but the problem actually began with a configuration change that affected the entire internal backbone. That cascaded into Facebook and other properties disappearing and staff internal to Facebook having difficulty getting service going again.” You can read the full explanation here.
Regardless, this had a cascading effect on Facebook’s many services around the world. The outage occurred just one day after a former Facebook employee and whistleblower revealed her identity, after sharing documents showing the company knew about the many harms its services were causing to democratic elections, among other issues, but put profit ahead of public good.