Did DNS take it down, or did a pattern of latent failures take it down? DNS was restored fairly quickly!

Nobody is saying that locks aren't interesting or important.

The Droplet lease timeouts were an aggravating factor for the severity of the incident, but are not causative. Absent a trigger the droplet leases never experience congestive failure.

The race condition was necessary and sufficient for collapse. Absent corrective action it always leads to AWS going down. In the presence of corrective actions the severity of the failure would have been minor without other aggravating factors, but the race condition is always the cause of this failure.

This doesn’t really matter. This type of error gets the whole 5 why’s treatment and every why needs to get fixed. Both problems will certainly have an action item

It is not my claim that AWS is going to handle this badly, only that this thread is.