You have to be careful with the approach of using Postgres for everything. The way it locks tables and rows and the serialization levels it guarantees are not immediately obvious to a lot of folks and can become a serious bottle-neck for performance-sensitive workloads.

I've been a happy Postgres user for several decades. Postgres can do a lot! But like anything, don't rely on maxims to do your engineering for you.

Yes, performance can be a big issue with postgres. And vertical scaling can really put a damper on things when you have a major traffic hit. Using it for kafka is misunderstanding the one of the great uses of kafka which is to help deal with traffic bursts. All of a sudden your postgres server is overwhelmed and the kafka server would be fine.

It's worth noting that Oracle has solved this problem. It has horizontal multi-master scalability (not sharded) and a queue subsystem called TxEQ which scales like Kafka does, but it's also got the features of a normal MQ broker. You can dequeue a message into a transaction, update tables in that same transaction, then commit to remove the message from the queue permanently. You can dequeue by predicate, delay messages, use producer/consumer patterns etc. It's quite flexible. The queues can be accessed via SQL stored procs, or client driver APIs, or it implements a Kafka compatible API now too I think.

If you rent a cloud DB then it can scale elastically which can make this cheaper than Postgres, believe it or not. Cloud databases are sold at the price the market will bear not the cost of inputs+margin, so you can end up paying for Postgres as much as you would for an Oracle DB whilst getting far fewer features and less scalability.

Source: recently joined the DB team at Oracle, was surprised to learn how much it can do.

>And vertical scaling can really put a damper on things when you have a major traffic hit.

Wouldn't OrioleDB solve that issue though?

Not familiar with OrioleDB. I’ll look it up. May I ask how this helps? Just curious.

100%

Postgres isn’t meant to be a guaranteed permanent replacement.

It’s a common starting point for a simpler stack which can retain a greater deal of flexibility out of the box and increased velocity.

Starting with Postgres lets the bottlenecks reveal themselves, and then optimize from there.

Maybe a tweak to Postgres or resources, or consider a jump to Kafka.

My strategy is to use postgres first. Get the idea off the ground and switch when postgres becomes the bottleneck.

It often doesn't.

Definitely, this is also one of the direction Rails is heading[1]: provide a basis setup most of the people can use out of the box. And if needed you can always plug in more "mature" solutions afterwards.

[1] https://rubyonrails.org/2024/11/7/rails-8-no-paas-required

I wish postgres would add a durable queue like data structure. But trying to make a durable queue that can scale beyond what a simple redis instance can do starts to run into problems quickly.

Also, LISTEN/NOTIFY do not scale, and they introduce locks in areas you aren't expecting - https://news.ycombinator.com/item?id=44490510

SKIP LOCKED doesn't work for your use case?

It would probably work fine, it would also put the jobs at risk of people who managed to convince their enterprises that a dumb but fast server (Kafka) was actually a good idea.

This is true of any data storage. You have to understand the concurrency model and assumptions, and know where bottlenecks can happen. Even among relational databases there are significant differences.

Postgres is just fantastic software.

But anytime you treat a database, or a queue, like a black box dumpster, problems will ensue.

Exactly. Or worse, you treat one as a straightforward black box swap in replacement for another. If you're looking to scale, you _will_ need to code to the idiosyncraties of your chosen solution.

Postgres doesnt scale into oblivion, but it can take some serious chunks of data once you start batching and making sure a every operation only touches single row with no transactions needed.

And then you are 99% of the way to Cassandra.

Of course the other 99% is the remaining 1%.

Nearly true, but you dont need to run a cassandra cluster to ship your 3k msg/sec and you can take smaller locks if you have a small number of senders that delete sent messages and send in chunks

cassandra doesn't have ACID, so you will start dealing with tons of other problems.

True, but you have to have a really intensive workload to hit its limits; something in the order of tens of thousands writes per second; and even then, you can shard to a few instances. So yes, there is a limit - but in practice, not for most systems

When someone says just use Postgres, are they using the same instance for their data as well for the queue?

When people say "just use postgres" it's because their immediate need is so low that this doesn't matter.

And the thing is, a server from 10 years ago running postgres (with a backup) is enough for most applications to handle thousands of simultaneous users. Without even going into the kinds of optimization you are talking about. Adding ops complexity for the sake of scale on the exploratory phase of a product is a really bad idea when there's an alternative out there that can carry you until you have fit some market. (And for some markets, that's enough forever.)

It can be a different database in the same server or a separate server.

When you’re doing hundreds or thousands of transactions to begin with it doesn’t really impact as much out of the gate.

Of course there will be someone who will pull out something that won’t work but such examples can likely be found for anything.

We don’t need to fear simplification, it is easy to complicate later when the actual complexities reveal themselves.

You would typically want to use the same database instance for your queue as long as you can get away with it because then transaction handling is trivial. As soon as you move the queue somewhere else you need to carefully think about how you'll deal with transactionality.

Yes, I often use PG for queues on the same instance. Most of the time you dont see any negative effects. For a new project with barely any users it doesn’t matter.