Cool project. Every time one of these projects comes up, I'm always somewhat disappointed it isn't an open source / postgres version of GCP Cloud Tasks.

All I ever want is a queue where I submit a message and then it hits an HTTP endpoint with that message as POST. It is such a better system than dedicated long running worker listeners, because then you can just scale your HTTP workers as needed. Pairs extremely well with autoscaling Cloud Functions, but could be anything really.

I also find that DAGs tend to get ugly really fast because it generally involves logic. I'd prefer that logic to not be tied into the queue implementation because it becomes harder to unit test. Much easier reason about if you have the HTTP endpoint create a new task, if it needs to.

We actually have support for that, we just haven't migrated the doc over to v1 yet: https://v0-docs.hatchet.run/home/features/webhooks. We'll send a POST request for each task.

> It is such a better system than dedicated long running worker listeners, because then you can just scale your HTTP workers as needed.

This depends on the use-case - with long running listeners, you get the benefit of reusing caches, database connections, and disk, and from a pricing perspective, if your task spends a lot of time waiting for i/o operations (or waiting for an event), you don't get billed separately for CPU time. A long-running worker can handle thousands of concurrently running functions on cheap hardware.

> I also find that DAGs tend to get ugly really fast because it generally involves logic. I'd prefer that logic to not be tied into the queue implementation because it becomes harder to unit test. Much easier reason about if you have the HTTP endpoint create a new task, if it needs to.

We usually recommend that DAGs which require too much logic (particularly fanout to a dynamic amount of workflows) should be implemented as a durable task instead.

Thanks for your response. webhooks are literally the last thing documented, which says to me that it isn't a focus for your product at all.

I used to work for a company that used long running listeners. They would more often than not, get into a state where (for example) they would need to upgrade some code and now they had all these long running jobs (some would go for 24 hours!), that if they stopped them, would screw everything up down the line because it would take so long to finish if they restarted them that it would impact customer facing data. Just like DAG's, it sounds good on paper, but it is a terrible design pattern that will eventually bite you in the ass.

The better solution is to divide and conquer. Break things up into smaller units of work and then submit more messages to the queue. This way, you can break at any point and you won't lose hours worth of work. The way to force this to developers, is to set constraints about how long things can execute for. Make them think about what they are building and build idempotency into things.

The fact that you're building a system that supports all these footguns seems terrifying. "Usually recommend" is undesirable, people will always find ways to use things in the way you don't expect it. I'd much rather work with a more constrained system than one trying to be all things to all people. Cloud Tasks does a really good job of just doing one thing well.

Admittedly webhook workers aren't exactly this since we send multiple tasks to the same endpoint, where I believe you can register one endpoint per task with Cloud Task. Although, this is not a large change.

I use a router on my end, so it would always be one endpoint anyway. The problem with Cloud Tasks, is that the more individual tasks you create, the more time it takes to deploy. Better to hide all that behind a single router.

Cloudtasks are excellent and I’ve been wanting something similar for years.

I’ve been occasionally hacking away at a proof of concept built on riverqueue but have eased off for a while due to performance issues obvious with non-partitioned tables and just general laziness.

https://github.com/jarshwah/dispatchr if curious but it doesn’t actually work yet.

Developer of River here ( https://riverqueue.com ). I'm curious if you ran into actual performance limitations based on specific testing and use cases, or if it's more of a hypothetical concern. Modern Postgres running on modern hardware and with well-written software can handle many thousands or tens of thousands of jobs per second (even without partitioning), albeit that depends on your workload, your tuning / autovacuum settings, and your job retention time.

Perceived only at this stage, though the kind of volume we’re looking at is 10s to 100s of millions of jobs per day. https://github.com/riverqueue/river/issues/746 talks about some of the same things you mention.

To be clear, I really like the model of riverqueue and will keep going at a leisurely pace since this is a personal time interest at the moment. I’m sick of celery and believe a service is a better model for background tasks than a language-specific tool.

If you guys were to build http ingestion and http targets I’d try and deploy it right away.

Ah, so that issue is specifically related to a statistics/count query used by the UI and not by River itself. I think it's something we'll build a more efficient solution for in the future because counting large quantities of records in Postgres tends to be slow no matter what, but hopefully it won't get in the way of regular usage.

> Perceived only at this stage, though the kind of volume we’re looking at is 10s to 100s of millions of jobs per day.

Yeah that's a little over 100 jobs/sec sustained :) Shouldn't be much of an issue on appropriate hardware and with a little tuning, in particular to keep your jobs table from growing to more than a few million rows and to vacuum frequently. Definitely hit us up if you try it and start having any trouble!

Yea I also like this system only problem I was facing with it was http read will lead to timeouts/lost connections. And task queues specifically have a 30 min execution limit. But I really like how it separates the queueing logic from the whole application/execution graph. Task queues are one of my favourite pieces of cloud infrastructure.

How do you deal with cloud tasks in dev/test?

Great question.

I built my own super simple router abstraction. Message comes in, goes into my router, which sends it to the right handler.

I only test the handler itself, without any need for the higher level tasks. This also means that I'm only thinly tied to GCP Tasks and can migrate to another system by just changing the router.

What we did was mock it to make the http request blocking.

Alternatively you can use ngrok(or similar) and a test task queue that is calling your service running on localhost tunneled via ngrok.