> "surely if I send request to 5 nodes some of that will land on disk in reasonably near future?"

That would be asynchronous replication. But IIUC the author is instead advocating for a distributed log with synchronous quorum writes.

But we know this is not actually robust because storage and power failures tend to be correlated. The most recent Jepsen analysis again highlights that it's flawed thinking: https://jepsen.io/analyses/nats-2.12.1

The Aurora paper [0] goes into detail of correlated failures.

> In Aurora, we have chosen a design point of tolerating (a) losing an entire AZ and one additional node (AZ+1) without losing data, and (b) losing an entire AZ without impacting the ability to write data. [..] With such a model, we can (a) lose a single AZ and one additional node (a failure of 3 nodes) without losing read availability, and (b) lose any two nodes, including a single AZ failure and maintain write availability.

As for why this can be considered durable enough, section 2.2 gives an argument based on their MTTR (mean time to repair) of storage segments

> We would need to see two such failures in the same 10 second window plus a failure of an AZ not containing either of these two independent failures to lose quorum. At our observed failure rates, that’s sufficiently unlikely, even for the number of databases we manage for our customers.

[0] https://pages.cs.wisc.edu/~yxy/cs764-f20/papers/aurora-sigmo...

[deleted]

I believe testing over paper claims