> Most workloads don’t need distributed OLTP. Hardware got faster and cheaper. A single beefy machine can handle the majority of transactional workloads. Cursor and OpenAI are powered by a single-box Postgres instance. You’ll be just fine.
I thought this was such an important point. Sooooo many dev hours were spent figuring out how to do distributed writes, and for a lot of companies that work was never needed.
I thought it was the weakest point. The need for a distributed DB is rarely performance, it's availability and durability.
I think you misunderstood his point (and mine). There are usually much better ways to support availability and durability than to have multiple simultaneous write servers. On the contrary, having multiple write servers is usually worse for availability and durability because of the complexity.
For example, look at how Google Cloud SQL's aptly name "High Availability" configuration supports high availability: 1 primary and 1 standby. The standby is synced to the primary, and the roles are switched if a failover occurs.
But you can get more availability and more durability with much easier alternatives:
- Availability: spin up more read replicas.
- Durability: spin up more read replicas and also write to S3 asynchronously.
With Postgres on Neon, you can have both of these very easily. Same with Aurora.
(Disclaimer: I work at Neon)
This doesn’t seem to provide higher write availability, and if the read replicas are consistent with the write replica this design must surely degrade write availability as it improves read availability, since the write replica must update all the read replicas.
This also doesn’t appear to describe a higher durability design at all by normal definitions (in the context of databases at least) if it’s async…?
Yeah, this is not about write availability, but as the OP/author points out, scaling that is not the bottleneck for most apps.
I think you may have misunderstood the GP and are perhaps misusing terminology. You cannot meaningfully scale vertically to improve write availability, and if you care about availability a single machine (and often a primary/secondary setup) is insufficient.
Even if you only care about scaling reads, eventually the 1:N write:read replica ratio will become too costly to maintain, and long before you reach that point you likely sacrifice real-time isolation guarantees to maintain your write availability and throughput.
> You cannot meaningfully scale vertically to improve write availability
Disagree. Even if you limit yourself to the cloud, r7i/r8g.48xl gets you 192 vCPU / 1.5 TiB RAM. If you really want to get silly, x2iedn.32xl is 128 vCPU / 4 TiB RAM, and you get 3.8 TiB of local NVMe storage for temp tablespace. The money you’ll pay ($16.5K - $44K month, depending on specific class) would pay for a similarly spec’d server in the same amount of time, though.
Which brings me to the novel concept of owning your own hardware. A quick look at Supermicro’s site shows a 2U w/ up to 1.92 PB of Gen5 NVMe, 8 TiB of RAM, and dual sockets. That would likely cost a wee bit more than a month of renting the aforementioned AWS VM, but a more reasonably spec’d one would not. Realistically, that much storage would be used as SDS for other DBs to use. NVMoF isn’t quite as fast as local disks, but it’s a hell of a lot faster than EBS et al.
The point is that you actually can vertically scale to stupidly high levels, it’s just that most companies have no idea how to run servers anymore.
> and if you care about availability a single machine (and often a primary/secondary setup) is insufficient.
Depending on your availability SLOs, of course, I think you’d find that a two-node setup (optionally having N read replicas) with one in standby would be quite sufficient. Speaking from personal experience on RDS (MySQL fronted with ProxySQL on K8s, load balanced with NLB), I experienced a single outage in two years. When it happened, no one noticed, it was so brief. Some notice-only alerts for 500s in Slack, but no pages went out.
> If you really want to get silly, x2iedn.32xl is 128 vCPU / 4 TiB RAM, and you get 3.8 TiB of local NVMe
This doesn't affect availability - except insofar as unavailability might be caused by insufficient capacity, which is not the typical definition.
> Depending on your availability SLOs, of course
Yes, exactly. Which is the point the GP was making. You generally make the trade-off in question not for performance, but because you have SLOs demanding higher availability. If you do not have these SLOs, then of course you don't want to make that trade-off.
> This doesn't affect availability - except insofar as unavailability might be caused by insufficient capacity, which is not the typical definition.
I agree, but it seemed to me that GP was using it as such: "You cannot meaningfully scale vertically to improve write availability"
The big caveat about these configurations is the amount of time it takes to rebuild a replica due to the quantity of storage per node that has to be pushed over the network. This is one of the low-key major advantages of disaggregated storage.
I prefer to design my own hardware infrastructure but there are many operational tradeoffs to consider.
> you likely sacrifice real-time isolation guarantees to maintain your write availability and throughput
No worries there, in all likelihood isolation has probably been killed twice already. Once by running the DB on READ COMMITTED, and a second time by using an ORM like EF to read data into your application, fiddle with it in-RAM, and write the new (unrelated-to-what-was-read) data back to the DB.
In other words, we throw out all that performant 2010-2020 NoSQL & eventual consistency tech, and go back to good old fashioned SQL & ACID, because everyone knows SQL, and ACID is amazing. Then we use LINQ/EF instead because it turns out that no-one actually wants to touch SQL, and full isolation is too slow so that gets axed too.
No loss of committed transactions is acceptable to any serious business.
>I work at Neon
In my opinion, distributed DB solutions without synchronous write replication are DOA. Apparently a good number of people don't share this opinion because there's a whole cottage industry around such solutions, but I would never touch them with a 10 foot stick.
Something tells me neither cursor nor openai need write workloads, so they would probably do just as fine using a flat file. I'm honestly curious what use either would have for queries that you couldn't get with a filesystem.
Certainly neither products have much obvious need for OLTP workloads. Hell, neither have any need for transactions at all. You're just paying them for raw CPU.
Update: in my mind, this reflects analytics of queries. Just further reason to run your own models I guess....
It's not just analytics. ChatGPT saves all of your conversation history - I don't know if they save the full conversation text in postgres, but I'd assume they at least save conversation metadata there.
You may not want this from a privacy perspective, but as a user I find it to be a very useful feature, e.g. I can see my full history, I can easily share conversations with a share link (and it's the exact version of that conversation, not like a URL where contents can change).