Hacker News

Was still seeing SQS latency affecting my systems a full day after they gave the “all clear.” There are red flags all over this summary to me, particularly the case where they had no operational procedure for recovery. That seems to me impossible in a hyperscaler - you never considered this failure scenario, ever? Or did you lose engineers that did know?

Anyway appreciate that this seems pretty honest and descriptive.