Unless you’re planning on using their (temporalio’s) saas you’re in for building a very large database cluster for this if you need some scale.

(source: i run way more cassandra than i ever thought reasonable)

Just got roped into setting up an on prem temporal cluster myself :(

What causes the need for massive database clusters? Now I'm worried this is going to fall apart on us in a very big way

Take a look at the official “basic scaling” guide especially the metric about state transitions / second.

To get an idea of what you’ll need that metric to be try running 1/10th of your workload as a benchmark against it.

In order for our particular setup to handle barely 5000 of these we have almost 100cpus just for cassandra. To double this, it’s 200 cpus just for database.

Oh and make sure you get your history shard count right as you can’t change it without rebuilding it.

Maybe it makes sense for low volume high value jobs e.g uber trips, for high volume low value this doesn’t work economically.

We are likely to drop it.