A major outage at Cloudflare on Tuesday, February 28, 2024, took down or slowed access to a wide range of popular websites and services, including OpenAI, Spotify, X (formerly Twitter), and Grindr. The incident, the company’s worst since 2019, highlights how a small number of infrastructure providers underpin the modern internet—and how vulnerable that system remains.
What Happened?
The outage began around 3:30 AM PT and lasted for over three hours, with full recovery reported by the end of the day. Cloudflare CEO Matthew Prince confirmed the problem was not due to a cyberattack, but rather an internal software failure. Specifically, a database change generated an unusually large configuration file that the system could not process, causing cascading failures across the network.
Cloudflare quickly identified and reverted to a previous version of the file, restoring traffic flow by 6:30 AM PT. Prince issued a public apology, acknowledging the severity of the disruption. “Given Cloudflare’s importance in the Internet ecosystem any outage of any of our systems is unacceptable,” he stated.
The Scale of the Impact
Approximately 20% of all websites rely on Cloudflare’s services, making the outage far-reaching. Downdetector, a service for reporting outages (owned by the same parent company as CNET), logged over 2.1 million reports during the event, with the US, UK, Japan, and Germany most affected.
Beyond Cloudflare itself, users reported issues with X (320,549 reports), League of Legends (130,260 reports), OpenAI (81,077 reports), Spotify (93,377 reports) and Grindr (25,031 reports). The outage exposed how heavily many digital services depend on a few key infrastructure players.
A Recurring Problem?
The Cloudflare disruption follows similar incidents at Amazon Web Services (AWS) and Microsoft Azure in recent months. These failures raise questions about the concentration risk in modern internet infrastructure. Forrester Research analyst Brent Ellis estimates the Cloudflare outage alone may have caused $250 to $300 million in direct and indirect losses.
The incident also underscores the fragility of artificial intelligence infrastructure. The disruption to OpenAI, a leading AI platform, highlights how even cutting-edge technologies are reliant on stable underlying systems. As Cornell University’s Sarah Kreps noted, “The issue exposes the reality that this multibillion, even trillion-dollar investment in AI is only as reliable as its least scrutinized third-party infrastructure.”
The reliance on centralized services creates systemic vulnerabilities. Outages like this demonstrate that even the most advanced digital tools are susceptible to failure if the foundation is unstable.
The Cloudflare outage serves as a stark reminder that the internet, despite its ubiquity, remains a complex and fragile system. While the company has apologized and taken steps to prevent recurrence, the incident underscores the need for greater resilience and diversification in critical infrastructure.
