How do you implement "unique monotonically-increasing offset number"?
Naive approach with sequence (or serial type which uses sequence automatically) does not work. Transaction "one" gets number "123", transaction "two" gets number "124". Transaction "two" commits, now table contains "122", "124" rows and readers can start to process it. Then transaction "one" commits with its "123" number, but readers already past "124". And transaction "one" might never commit for various reasons (e.g. client just got power cut), so just waiting for "123" forever does not cut it.
Notifications can help with this approach, but then you can't restart old readers (and you don't need monotonic numbers at all).
It's a tricky problem, I'd recommend reading DDIA, it covers this extensively:
https://www.oreilly.com/library/view/designing-data-intensiv...
You can generate distributed monotonic number sequences with a Lamport Clock.
https://en.wikipedia.org/wiki/Lamport_timestamp
The wikipedia entry doesn't describe it as well as that book does.
It's not the end of the puzzle for distributed systems, but it gets you a long way there.
See also Vector clocks. https://en.wikipedia.org/wiki/Vector_clock
Edit: I've found these slides, which are a good primer for solving the issue, page 70 onwards "logical time":
https://ia904606.us.archive.org/32/items/distributed-systems...
The "unique monotonically-increasing offset number" use case works just fine. I need a unique sequence number in ascending order doesn't (your problem). Why you need two queue to share the same sequence object is your problem I think.
Another way to speed it up is to grab unique numbers in batches instead of just getting them one at a time. No idea why you want your numbers to be in absolute sequence. That's hard in a distributed system. Probably best to relax that constraint and find some other way to track individual pieces of data. Or even better, find a way so you don't have to track individual rows in a distributed system.
The article describes using a dedicated table for the counter, one row per table, in the same transaction (so parallel writers to the same table wait for each other through a lock on that row).
If you would rather have readers waiting and parallel writers there is a more complex scheme here: https://blog.sequinstream.com/postgres-sequences-can-commit-...
In the article, they just don't and instead do "SELECT FOR UPDATE SKIP LOCKED" to make sure things get picked up once.
The article speaks of two usecases, work queue and pub/sub event log. You talk about the first and the comment you reply to the latter. You need event sequence numbering for the pub/sub event log.
In a sense this is what Kafka IS architecturally: The component that assigns event sequence numbers.
The log_counter table tracks this. It's true that a naive solution using sequences does not work for exactly the reason you say.
You can fill in a noop for sequence number 123 after a timeout. You also need to be able to kill old transactions so that the transaction which was assigned 123 isn't just chilling out (which would block writing the noop).
Another approach which I used in the past was to assign sequence numbers after committing. Basically a separate process periodically scans the set of un-sequenced rows, applies any application defined ordering constraints, and writes in SNs to them. This can be surprisingly fast, like tens of thousands of rows per second. In my case, the ordering constraints were simple, basically that for a given key, increasing versions get increasing SNs. But I think you could have more complex constraints, although it might get tricky with batch boundaries
My approach is: select max(id), and commit with id=max(id)+1. If commit worked, then all good. If commit failed because of unique index violation, repeat the transaction from the beginning. I think it should work correctly with proper transaction isolation level.
That limits you to a few tens of TPS since everything is trying to write the same row which must happen serially. I wouldn't start out with that solution since it'll be painful to change to something more scalable later. Migrating to something better will probably involve more writes per txn during the migration, so it gets even worse before it gets better.
The counter in another table used in the article also serializes all writers to the table. Probably better than the max() approach but still serial.
There needs to be serialization happening somewhere, either by writers or readers waiting for their turn.
What Kafka "is" in my view is simply the component that assigns sequential event numbers. So if you publish to Kafka, Kafka takes the same locks...
How to increase throughput is add more shards in a topic.
Does the additional read query cause concern? Or mostly this is ok? (i'm sure the answer depends on scale)
> unique monotonically-increasing offset number
Isn't it a bit of a white whale thing that a umion can solve all one's subscriber problems? Afaik even with kafka this isn't completely watertight.
I have this problem in the system I work on - the short nuance-less answer from my experience is that, once your scale gets large enough, you can't prevent ordering issues entirely and you have to build the resilience into the architecture and the framing of the problem. You often end up paying for consistency with latency.
I think you may be talking past each other. In the approach taken in the article and the parent comment, if the event sequence number allocation of the writer races the reader cursor position in the wrong way, events will NEVER BE DELIVERED.
So it is a much more serious issue at stake here than event ordering/consistency.
As it happens, if you use event log tables in SQL "the Kafka way" you actually get guarantee on event ordering too as a side effect, but that is not the primary goal.
More detailed description of problem:
https://github.com/vippsas/mssql-changefeed/blob/main/MOTIVA...
Funnily enough, I was just designing a queue exactly this way, thanks for catching this. (chat GPT meanwhile was assuring me the approach was airtight)
you're really trying to vibe architect?
Gotta make a living somehow
What about a `DEFERRABLE INITIALLY DEFERRED` trigger that increments a sequence only on commit?