I think you misunderstood his point (and mine). There are usually much better ways to support availability and durability than to have multiple simultaneous write servers. On the contrary, having multiple write servers is usually worse for availability and durability because of the complexity.
For example, look at how Google Cloud SQL's aptly name "High Availability" configuration supports high availability: 1 primary and 1 standby. The standby is synced to the primary, and the roles are switched if a failover occurs.
This doesn’t seem to provide higher write availability, and if the read replicas are consistent with the write replica this design must surely degrade write availability as it improves read availability, since the write replica must update all the read replicas.
This also doesn’t appear to describe a higher durability design at all by normal definitions (in the context of databases at least) if it’s async…?
I think you may have misunderstood the GP and are perhaps misusing terminology. You cannot meaningfully scale vertically to improve write availability, and if you care about availability a single machine (and often a primary/secondary setup) is insufficient.
Even if you only care about scaling reads, eventually the 1:N write:read replica ratio will become too costly to maintain, and long before you reach that point you likely sacrifice real-time isolation guarantees to maintain your write availability and throughput.
> You cannot meaningfully scale vertically to improve write availability
Disagree. Even if you limit yourself to the cloud, r7i/r8g.48xl gets you 192 vCPU / 1.5 TiB RAM. If you really want to get silly, x2iedn.32xl is 128 vCPU / 4 TiB RAM, and you get 3.8 TiB of local NVMe storage for temp tablespace. The money you’ll pay ($16.5K - $44K month, depending on specific class) would pay for a similarly spec’d server in the same amount of time, though.
Which brings me to the novel concept of owning your own hardware. A quick look at Supermicro’s site shows a 2U w/ up to 1.92 PB of Gen5 NVMe, 8 TiB of RAM, and dual sockets. That would likely cost a wee bit more than a month of renting the aforementioned AWS VM, but a more reasonably spec’d one would not. Realistically, that much storage would be used as SDS for other DBs to use. NVMoF isn’t quite as fast as local disks, but it’s a hell of a lot faster than EBS et al.
The point is that you actually can vertically scale to stupidly high levels, it’s just that most companies have no idea how to run servers anymore.
> and if you care about availability a single machine (and often a primary/secondary setup) is insufficient.
Depending on your availability SLOs, of course, I think you’d find that a two-node setup (optionally having N read replicas) with one in standby would be quite sufficient. Speaking from personal experience on RDS (MySQL fronted with ProxySQL on K8s, load balanced with NLB), I experienced a single outage in two years. When it happened, no one noticed, it was so brief. Some notice-only alerts for 500s in Slack, but no pages went out.
> If you really want to get silly, x2iedn.32xl is 128 vCPU / 4 TiB RAM, and you get 3.8 TiB of local NVMe
This doesn't affect availability - except insofar as unavailability might be caused by insufficient capacity, which is not the typical definition.
> Depending on your availability SLOs, of course
Yes, exactly. Which is the point the GP was making. You generally make the trade-off in question not for performance, but because you have SLOs demanding higher availability. If you do not have these SLOs, then of course you don't want to make that trade-off.
The big caveat about these configurations is the amount of time it takes to rebuild a replica due to the quantity of storage per node that has to be pushed over the network. This is one of the low-key major advantages of disaggregated storage.
I prefer to design my own hardware infrastructure but there are many operational tradeoffs to consider.
> you likely sacrifice real-time isolation guarantees to maintain your write availability and throughput
No worries there, in all likelihood isolation has probably been killed twice already. Once by running the DB on READ COMMITTED, and a second time by using an ORM like EF to read data into your application, fiddle with it in-RAM, and write the new (unrelated-to-what-was-read) data back to the DB.
In other words, we throw out all that performant 2010-2020 NoSQL & eventual consistency tech, and go back to good old fashioned SQL & ACID, because everyone knows SQL, and ACID is amazing. Then we use LINQ/EF instead because it turns out that no-one actually wants to touch SQL, and full isolation is too slow so that gets axed too.
No loss of committed transactions is acceptable to any serious business.
>I work at Neon
In my opinion, distributed DB solutions without synchronous write replication are DOA. Apparently a good number of people don't share this opinion because there's a whole cottage industry around such solutions, but I would never touch them with a 10 foot stick.
I think you misunderstood his point (and mine). There are usually much better ways to support availability and durability than to have multiple simultaneous write servers. On the contrary, having multiple write servers is usually worse for availability and durability because of the complexity.
For example, look at how Google Cloud SQL's aptly name "High Availability" configuration supports high availability: 1 primary and 1 standby. The standby is synced to the primary, and the roles are switched if a failover occurs.
But you can get more availability and more durability with much easier alternatives:
- Availability: spin up more read replicas.
- Durability: spin up more read replicas and also write to S3 asynchronously.
With Postgres on Neon, you can have both of these very easily. Same with Aurora.
(Disclaimer: I work at Neon)
This doesn’t seem to provide higher write availability, and if the read replicas are consistent with the write replica this design must surely degrade write availability as it improves read availability, since the write replica must update all the read replicas.
This also doesn’t appear to describe a higher durability design at all by normal definitions (in the context of databases at least) if it’s async…?
Yeah, this is not about write availability, but as the OP/author points out, scaling that is not the bottleneck for most apps.
I think you may have misunderstood the GP and are perhaps misusing terminology. You cannot meaningfully scale vertically to improve write availability, and if you care about availability a single machine (and often a primary/secondary setup) is insufficient.
Even if you only care about scaling reads, eventually the 1:N write:read replica ratio will become too costly to maintain, and long before you reach that point you likely sacrifice real-time isolation guarantees to maintain your write availability and throughput.
> You cannot meaningfully scale vertically to improve write availability
Disagree. Even if you limit yourself to the cloud, r7i/r8g.48xl gets you 192 vCPU / 1.5 TiB RAM. If you really want to get silly, x2iedn.32xl is 128 vCPU / 4 TiB RAM, and you get 3.8 TiB of local NVMe storage for temp tablespace. The money you’ll pay ($16.5K - $44K month, depending on specific class) would pay for a similarly spec’d server in the same amount of time, though.
Which brings me to the novel concept of owning your own hardware. A quick look at Supermicro’s site shows a 2U w/ up to 1.92 PB of Gen5 NVMe, 8 TiB of RAM, and dual sockets. That would likely cost a wee bit more than a month of renting the aforementioned AWS VM, but a more reasonably spec’d one would not. Realistically, that much storage would be used as SDS for other DBs to use. NVMoF isn’t quite as fast as local disks, but it’s a hell of a lot faster than EBS et al.
The point is that you actually can vertically scale to stupidly high levels, it’s just that most companies have no idea how to run servers anymore.
> and if you care about availability a single machine (and often a primary/secondary setup) is insufficient.
Depending on your availability SLOs, of course, I think you’d find that a two-node setup (optionally having N read replicas) with one in standby would be quite sufficient. Speaking from personal experience on RDS (MySQL fronted with ProxySQL on K8s, load balanced with NLB), I experienced a single outage in two years. When it happened, no one noticed, it was so brief. Some notice-only alerts for 500s in Slack, but no pages went out.
> If you really want to get silly, x2iedn.32xl is 128 vCPU / 4 TiB RAM, and you get 3.8 TiB of local NVMe
This doesn't affect availability - except insofar as unavailability might be caused by insufficient capacity, which is not the typical definition.
> Depending on your availability SLOs, of course
Yes, exactly. Which is the point the GP was making. You generally make the trade-off in question not for performance, but because you have SLOs demanding higher availability. If you do not have these SLOs, then of course you don't want to make that trade-off.
> This doesn't affect availability - except insofar as unavailability might be caused by insufficient capacity, which is not the typical definition.
I agree, but it seemed to me that GP was using it as such: "You cannot meaningfully scale vertically to improve write availability"
The big caveat about these configurations is the amount of time it takes to rebuild a replica due to the quantity of storage per node that has to be pushed over the network. This is one of the low-key major advantages of disaggregated storage.
I prefer to design my own hardware infrastructure but there are many operational tradeoffs to consider.
> you likely sacrifice real-time isolation guarantees to maintain your write availability and throughput
No worries there, in all likelihood isolation has probably been killed twice already. Once by running the DB on READ COMMITTED, and a second time by using an ORM like EF to read data into your application, fiddle with it in-RAM, and write the new (unrelated-to-what-was-read) data back to the DB.
In other words, we throw out all that performant 2010-2020 NoSQL & eventual consistency tech, and go back to good old fashioned SQL & ACID, because everyone knows SQL, and ACID is amazing. Then we use LINQ/EF instead because it turns out that no-one actually wants to touch SQL, and full isolation is too slow so that gets axed too.
No loss of committed transactions is acceptable to any serious business.
>I work at Neon
In my opinion, distributed DB solutions without synchronous write replication are DOA. Apparently a good number of people don't share this opinion because there's a whole cottage industry around such solutions, but I would never touch them with a 10 foot stick.