Friday, June 22, 2012

Twitter blames global outage Thursday on 'cascading bug'

SAN FRANCISCO -- After nearly a year without any significant periods of downtime, using Twitter was almost nothing like the early days of being on the service. So when Twitter went down Thursday for a period of more than an hour, it was something of a shock to its regular users.

After nearly six months of site reliability above 99 percent, Twitter was unreachable across the web and mobile devices multiple times over the course of the day, with intermittent periods of uptime and downtime.

Twitter representatives offered little initial explanation for the outages until around 1:00pm PDT, and only in short, 140 character bursts of information via the company's official Twitter account. It was not until about 4:30pm PDT that Twitter offered a lengthier explanation of the day's events.

"It's imperative that we remain available around the world, and today [Thursday] we stumbled," Twitter Vice President of Engineering Mazen Rawashdeh wrote in a blog post Thursday explaining the outages. "Not how we wanted today to go."

The problem, Rawashdeh explained, had to do with what is called a "cascading bug" -- a term that quickly spawned its own parody Twitter account -- in one of the company's infrastructure components. That bug was not confined to an individual element of the company software, so it created a cascading effect, spreading to other parts of the software and affecting Twitter's 150 million-plus users.

Twitter's explanation came after a day of speculation that ran the gamut, ranging from purported DDoS attacks, to potential problems in Twitter's recent physical headquarters relocation, and even to the farfetched positing that a trend of animated GIF avatars could have caused the widespread outage.

Mundane as the cause may have been, it was an unwelcome reminder of the site's unreliable history. There was the "fail whale" of the early days, a cutesy cartoon that slowly grew as irksome as the "Blue Screen of Death" the more ubiquitous it became. Back then, site-wide outages were hardly newsworthy events, common enough that early adopters grew accustomed to them over time.

To read more, go to AllThingsDigital

Engineering Mazen Rawashdeh, initial explanation, intermittent periods, explanation

Nypost.com

No comments:

Post a Comment