Amazon Web Services (s amzn) has issued a a postmortem of its Christmas Eve cloud computing outage that took many services — most notably Netflix (s nflx) — offline for a portion of the night. The cause, according to AWS: A developer accidentally deleted Elastic Load Balancer state data in Amazon’s US-East region that the service’s control plane needs in order to manage load balancers in that region.
All told, the outage (which began at 12:24 p.m. PT) lasted 23 hours and 41 minutes and, at its peak, crippled 6.8 percent of load balancers in the region while leaving others running — albeit unable to scale or be modified by users. The Elastic Load Balancer team didn’t realize the root cause of the problem for several hours, at which point it began the challenging process of attempting to restore the state data to a point in time just before its…
View original post 627 more words