Much of the time in a transaction can reasonably be non-db-cpu time, be it io wait or be it client CPU processing between queries.
Note I'm not talking about transactions that run >10 seconds, just ones with the queries themselves technically quite cheap.
At 10% db-CPU-usage, you get a 1 second transaction from just 100ms of CPU.
That many long-running transactions seem like a pretty unusual workload to me and potentially running into isolation issues. I can see running a few of these, but not a lot, especially at the same time?
A few (single digits, maybe low double digits) per database CPU core isn't necessarily too much, of course the pattern can be conflict-heavy, where it would be problematic, but this is not about absolute concurrency but about concurrency needed to saturate CPU.
In a properly optimized database absolute majority of queries will hit indices and most data will be in memory cache, so majority of transactions will be CPU or RAM bound. So increasing number of concurrent transactions will reduce throughput. There will be few transactions waiting for I/O, but if majority of transactions are waiting for I/O, it's either horrifically inefficient database or very non-standard usage.
Your arguments make sense for concurrent queries (though high-latency storage like S3 is becoming increasingly popular, especially for analytic loads).
But transactions aren't processing queries all the time. Often the application will do processing between sending queries to the database. During that time a transaction is open, but doesn't do any work on the database server.
It is bad application architecture. Database work should be concentrated in minimal transactional units and connection should be released between these units. All data should be prepared before unit start and additional processing should take place after transaction ended. Using long transactions will cause locks, even deadlocks and generally should be avoided. That's my experience at least. Sometimes business transaction should be split into several database transaction.
If you aren't hitting IO (I don't mean HDDs) on a large fraction of queries you either skipped a cache in front of the DB or your data is very small or you spent too much on RAM and too little on your NVMe being not a bottleneck.
A process that is blocked for io, whether network or disk, will get taken off the cpu and another process put on the cpu. It doesn’t just waste the cpu until the quanta is gone.
redis is single-threaded but handles lots of connections (i.e. > 500) with much better performance vs. postgres. there's zero chance someone building postgres in 2025 would do one process per connection, I don't think there's any argument that it's a good design for performance. it's just a long-ago design choice that would be difficult to change now.
It's about queueing work, not running all these queries at the same time. You can run pgbouncer or you can have a pool on your backend. Having more connections won't make it go faster, so that really seems like a low-priority thing for postgres to me. Even if you integrated pooling into postgres the overhead of auth would be still taking time for small queries anyway.
Much of the time in a transaction can reasonably be non-db-cpu time, be it io wait or be it client CPU processing between queries. Note I'm not talking about transactions that run >10 seconds, just ones with the queries themselves technically quite cheap. At 10% db-CPU-usage, you get a 1 second transaction from just 100ms of CPU.
That many long-running transactions seem like a pretty unusual workload to me and potentially running into isolation issues. I can see running a few of these, but not a lot, especially at the same time?
A few (single digits, maybe low double digits) per database CPU core isn't necessarily too much, of course the pattern can be conflict-heavy, where it would be problematic, but this is not about absolute concurrency but about concurrency needed to saturate CPU.
In a properly optimized database absolute majority of queries will hit indices and most data will be in memory cache, so majority of transactions will be CPU or RAM bound. So increasing number of concurrent transactions will reduce throughput. There will be few transactions waiting for I/O, but if majority of transactions are waiting for I/O, it's either horrifically inefficient database or very non-standard usage.
Your arguments make sense for concurrent queries (though high-latency storage like S3 is becoming increasingly popular, especially for analytic loads).
But transactions aren't processing queries all the time. Often the application will do processing between sending queries to the database. During that time a transaction is open, but doesn't do any work on the database server.
It is bad application architecture. Database work should be concentrated in minimal transactional units and connection should be released between these units. All data should be prepared before unit start and additional processing should take place after transaction ended. Using long transactions will cause locks, even deadlocks and generally should be avoided. That's my experience at least. Sometimes business transaction should be split into several database transaction.
Your database usage should not involve application-focused locks, MVCC will restart your transaction if needed to resolve concurrency.
If you aren't hitting IO (I don't mean HDDs) on a large fraction of queries you either skipped a cache in front of the DB or your data is very small or you spent too much on RAM and too little on your NVMe being not a bottleneck.
A process that is blocked for io, whether network or disk, will get taken off the cpu and another process put on the cpu. It doesn’t just waste the cpu until the quanta is gone.
redis is single-threaded but handles lots of connections (i.e. > 500) with much better performance vs. postgres. there's zero chance someone building postgres in 2025 would do one process per connection, I don't think there's any argument that it's a good design for performance. it's just a long-ago design choice that would be difficult to change now.
I disagree. If that was the case, pgBouncer wouldn't need to exist.
The problem of resource usage for many connections is real.
It's about queueing work, not running all these queries at the same time. You can run pgbouncer or you can have a pool on your backend. Having more connections won't make it go faster, so that really seems like a low-priority thing for postgres to me. Even if you integrated pooling into postgres the overhead of auth would be still taking time for small queries anyway.
that's too simplistic.
there are many reasons to need something like pgbouncer.
1) not all workloads lend themselves to backend pools
2) not all backends can afford pooling
3) you don't always control your backend
4) you might have a lot of different backends connecting to the database. it's much simpler to just tell them to connect to pgbouncer.
5) you don't want to trust backend team to not bring postgresql down with many simultaneolus connections.
6) backend pools often misbehave
people don't wake up one day and just add another cog to their infrastructure randomly.