I want to rewrite some of my setup, we're doing IoT, and I was planning on
MQTT -> Redpanda (for message logs and replay, etc) -> Postgres/Timescaledb (for data) + S3 (for archive)
(and possibly Flink/RisingWave/Arroyo somewhere in order to do some alerting/incrementally updated materialized views/ etc)
this seems "simple enough" (but I don't have any experience with Redpanda) but is indeed one more moving part compared to MQTT -> Postgres (as a queue) -> Postgres/Timescaledb + S3
Questions:
1. my "fear" would be that if I use the same Postgres for the queue and for my business database, the "message ingestion" part could block the "business" part sometimes (locks, etc)? Also perhaps when I want to update the schema of my database and not "stop" the inflow of messages, not sure if this would be easy?
2. also that since it would write messages in the queue and then delete them, there would be a lot of GC/Vacuuming to do, compared to my business database which is mostly append-only?
3. and if I split the "Postgres queue" from "Postgres database" as two different processes, of course I have "one less tech to learn", but I still have to get used to pgmq, integrate it, etc, is that really much easier than adding Redpanda?
4. I guess most Postgres queues are also "simple" and don't provide "fanout" for multiple things (eg I want to take one of my IoT message, clean it up, store it in my timescaledb, and also archive it to S3, and also run an alert detector on it, etc)
What would be the recommendation?
Re 1. Look up non-blocking migrations for postgres. You can generally do large schema migrations while only briefly taking exclusive locks. It's a common mistake to perform a blocking migration and lock up your database (e.g. using CREATE INDEX on an existing table instead of CREATE INDEX CONCURRENTLY).
There are globally shared resources, but for the most part, locks are held on specific rows or tables. Unrelated transactions generally won't block on each other.
Also running a Very High Availability cluster is non-trivial. It can take a minute to fail over to a replica, and a busy database can take a while to replay the WAL after a reboot before it's functional again. Most people are OK with a couple minutes of downtime for the occasional reboot though.
I think this really depends on your scale. Are you doing <100 messages/second? Definitely stick with postgres. Are you doing >100k messages/second? Think about Kafka/redpanda. If you were comfortable with postgres (or you will be since you are building the rest of your project with it), then you want to stick with postgres longer, but if you are barely using it and would struggle to diagnose an issue, then you won't benefit from consolidating.
Postgres will also be more flexible. Kafka can only do partitions and consumer groups, so if your workload doesn't look like that (e.g. out of order processing), you might be fighting Kafka.
My suggestion would be even simpler:
MQTT -> Postgres (+ S3 for archive)
> 1. my "fear" would be that if I use the same Postgres for the queue and for my business database...
This is a feature, not a bug. In this way you can pair the handling of the message with the business data changes which result in the same transaction. This isn't quite "exactly-once" handling, but it's really really close!
> 2. also that since it would write messages in the queue and then delete them, there would be a lot of GC/Vacuuming
Generally it's best practice in this case to never delete messages from a SQL "queue", but toggle them in-place to consumed and periodically archive to a long-term storage table. This provides in-context historical data which can be super helpful when you need to write a script to undo or mitigate bad code which resulted in data corruption.
Alternatively when you need to roll back to a previous state, often this gives you a "poor woman's undo", by restoring a time-stamped backup, copying over messages which arrived since the restoration point, then letting the engine run forwards processing those messages. (This is a simplification of course, not always directly possible, but data recovery is often a matter of mitigations and least-bad choices.)
Basically, saving all your messages provides both efficiency and data recovery optionality.
> 3...
Legit concern, particularly if you're trying to design your service abstraction to match an eventual evolution of data platform.
> 4. don't provide "fanout" for multiple things
What they do provide is running multiple handling of a queue, wherein you might have n handlers (each with its own "handled_at" timestamp column in the DB), and different handles run at different priorities. This doesn't allow for workflows (ie a cleanup step) but does allow different processes to run on the same queue with different privileges or priorities. So the slow process (archive?) could run opportunistically or in batches, where time-sensitive issues (alerts, outlier detection, etc) can always run instantly. Or archiving can be done by a process which lacks access to any user data to algorithmically enforce PCI boundaries. Etc.
> This is a feature, not a bug. In this way you can pair the handling of the message with the business data changes which result in the same transaction.
That’s a particularly nasty trap. Devs will start using this everywhere and it makes it very hard to move this beyond Postgres when you need to.
I’d keep a small transactional outbox for when you really need it and encourage devs to use it only when absolutely necessary.
I’m currently cleaning up an application that has reached the limit of vertical scaling with Postgres. A significant part of that is because it uses Postgres for every background work queue. Every insert into the queue is in a transaction—do you really want to rollback your change because a notification job couldn’t be enqueued? Probably not. But the ability is there and is so easy to do that it gets overused.
Now I get to go back through hundreds of cases and try to determine whether the transactional insert was intentional or just someone not thinking.
The problem is either you have this feature or you dont, misusing it is another problem. Not having a feature sucks, and most distributed databases will even give you options for consistent (slow ass) reads.
If you have a database that supports transactions and something like skip locked, you always have option of building a transactional outbox when you need it.
> Generally it's best practice in this case to never delete messages from a SQL "queue", but toggle them in-place to consumed and periodically archive to a long-term storage table.
Ignoring the potential uses for this data, what you suggested has the exact same effect on Postgres at a tuple level. An UPDATE is essentially the same as a DELETE + INSERT, due to its MVCC implementation. The only way around this is with a HOT update, which requires (among other things) that no indexed columns were updated. Since presumably in this schema you’d have a column like is_complete or is_deleted, and a partial index on it, as soon as you toggle it, it can’t do a HOT update, so the concerns about vacuum still apply.
> This is a feature, not a bug.
Until your postgresql instance goes down (even by reasons unrelated to pgsql) and then you have no fallback or queue for elasticity
> I want to rewrite some of my setup, we're doing IoT, and I was planning on
Is this some scripting to automate your home, or are you trying to build some multi-tenant thing that you can sell?
If it's just scripting to automate your home, then you could probably get away with a single server and on-disk/in-memory queuing, maybe even sqlite, etc. Or you could use it as an opportunity to learn those technologies, but you don't really need them in your pipeline.
It's amazing how much performance you can get as long as the problem can fit onto a single node's RAM/SSD.
Another good item to consider:
n) Do you really need S3? is it cheaper than NFS storage on a compute node with a large disk?
There are many cases where S3 is absolutely cheaper though.
In my experience NFS is always the wrong thing to use.
Your application think it's a normal disk but it isn't, so you get no timeouts, no specific errors for network issues and extremely expensive calls camouflage as quick FS ops (was any file modified in this folder ? I'll just loop over them using my standard library nice FS utilities). And you don't get atomic ops outside of mv, invalidation and caching are complicated and your developers probably don't know the semantics of FS operations, which are much more complex and less well documented than eg Redis blob storage.
And then when you finally rip out NFS, you have thousands of lines of app and test code that assumes your blobs are on a disk in subtle ways.
Safe to assume s3fs and smb have similar issues?
I would think something like NFS is best suited for an actual file instead of blob you're serializing using a file system api?
Re (2) there is a lot of vacuuming, but the table is small, and it's usually very fast and productive.
You can run into issues with scheduled queues (e.g. run this job in 5 minutes) since the tables will be bigger, you need an index, and you will create the garbage in the index at the point you are querying (jobs to run now). This is a spectacularly bad pattern for postgres at high volume.
> Also perhaps when I want to update the schema of my database and not "stop" the inflow of messages, not sure if this would be easy?
Doesn't PostgreSQL have transactional schema updates as a key feature? AIUI, you shouldn't be having any data loss as a result of such changes. It's also common to use views in order to simplify the management of such updates.