Legitimate question on if the talent exodus from AWS is starting to take its toll. I’m talking about all the senior long-turned folks jumping ship for greener pastures, not the layoffs this week which mostly didn’t touch AWS (folks saying that will happen in future rounds).
The fact that there was an outage is not unexpected… it happens… but all the stumbling and length to get things under control was concerning.
Corey Quinn wrote an interesting article addressing that question: https://www.theregister.com/2025/10/20/aws_outage_amazon_bra...
Some good information in the comments as well.
If you average it out over the last decade do we really have more outages now than before? Any complex system with lots of moving parts is bound to fail every so often.
It's the length of the outage that's striking. AWS us-east-1 has had a few serious outages in the last ~decade, but IIRC none took near 14 hours to resolve.
The horrible us-east-1 S3 outage of 2017[1] was around 5 hours.
1. https://aws.amazon.com/message/41926/
Couldn’t this be explained by natural growth of the amount of cloud resources/data under management?
The more you have, the faster the backlog grows in case of an outage, so you need longer to process it all once the system comes back online.
Not really. The issue was the time it took to correctly diagnose the issue and then the cascading failures that resulted triggering more lengthy troubleshooting. Rightly or wrongly it plays into the “the folks that knew best how all this works have left the building” vibes. Folks inside AWS say that’s not entirely inaccurate.
Hope not.. Smooth tech that runs is like the Maytag man.
Tech departments running around with their hair on fire / always looking busy isn't one that always builds trust.