Yeah, this is not about write availability, but as the OP/author points out, scaling that is not the bottleneck for most apps.

I think you may have misunderstood the GP and are perhaps misusing terminology. You cannot meaningfully scale vertically to improve write availability, and if you care about availability a single machine (and often a primary/secondary setup) is insufficient.

Even if you only care about scaling reads, eventually the 1:N write:read replica ratio will become too costly to maintain, and long before you reach that point you likely sacrifice real-time isolation guarantees to maintain your write availability and throughput.

> You cannot meaningfully scale vertically to improve write availability

Disagree. Even if you limit yourself to the cloud, r7i/r8g.48xl gets you 192 vCPU / 1.5 TiB RAM. If you really want to get silly, x2iedn.32xl is 128 vCPU / 4 TiB RAM, and you get 3.8 TiB of local NVMe storage for temp tablespace. The money you’ll pay ($16.5K - $44K month, depending on specific class) would pay for a similarly spec’d server in the same amount of time, though.

Which brings me to the novel concept of owning your own hardware. A quick look at Supermicro’s site shows a 2U w/ up to 1.92 PB of Gen5 NVMe, 8 TiB of RAM, and dual sockets. That would likely cost a wee bit more than a month of renting the aforementioned AWS VM, but a more reasonably spec’d one would not. Realistically, that much storage would be used as SDS for other DBs to use. NVMoF isn’t quite as fast as local disks, but it’s a hell of a lot faster than EBS et al.

The point is that you actually can vertically scale to stupidly high levels, it’s just that most companies have no idea how to run servers anymore.

> and if you care about availability a single machine (and often a primary/secondary setup) is insufficient.

Depending on your availability SLOs, of course, I think you’d find that a two-node setup (optionally having N read replicas) with one in standby would be quite sufficient. Speaking from personal experience on RDS (MySQL fronted with ProxySQL on K8s, load balanced with NLB), I experienced a single outage in two years. When it happened, no one noticed, it was so brief. Some notice-only alerts for 500s in Slack, but no pages went out.

> If you really want to get silly, x2iedn.32xl is 128 vCPU / 4 TiB RAM, and you get 3.8 TiB of local NVMe

This doesn't affect availability - except insofar as unavailability might be caused by insufficient capacity, which is not the typical definition.

> Depending on your availability SLOs, of course

Yes, exactly. Which is the point the GP was making. You generally make the trade-off in question not for performance, but because you have SLOs demanding higher availability. If you do not have these SLOs, then of course you don't want to make that trade-off.

> This doesn't affect availability - except insofar as unavailability might be caused by insufficient capacity, which is not the typical definition.

I agree, but it seemed to me that GP was using it as such: "You cannot meaningfully scale vertically to improve write availability"

The big caveat about these configurations is the amount of time it takes to rebuild a replica due to the quantity of storage per node that has to be pushed over the network. This is one of the low-key major advantages of disaggregated storage.

I prefer to design my own hardware infrastructure but there are many operational tradeoffs to consider.

> you likely sacrifice real-time isolation guarantees to maintain your write availability and throughput

No worries there, in all likelihood isolation has probably been killed twice already. Once by running the DB on READ COMMITTED, and a second time by using an ORM like EF to read data into your application, fiddle with it in-RAM, and write the new (unrelated-to-what-was-read) data back to the DB.

In other words, we throw out all that performant 2010-2020 NoSQL & eventual consistency tech, and go back to good old fashioned SQL & ACID, because everyone knows SQL, and ACID is amazing. Then we use LINQ/EF instead because it turns out that no-one actually wants to touch SQL, and full isolation is too slow so that gets axed too.