It is a *very bad* replacement for an MQ system, for the simple reason you can't quickly and effortlessly scale int/out consumers.

Why can't you? In my experience, scaling Kafka was far easier than scaling our RabbitMQ cluster. We started running into issues when our RabbitMQ cluster hit 25k TPS, our Kafka cluster of equivalent resources didn't break a sweat at 500k TPS.

Scaling consumers, not throughput. And it’s both directions (in and out, not just out).

How did you handle re-partitioning and rebalancing every time you scaled your cluster in or out?

What sort of systems do you work on to require this kind of traffic volume? I've worked on one project that I'd consider relatively high volume (UK Post Office Horizon Online) and we were only targeting 500 TPS.

Think extremely popular multiplayer videogames. We used these systems for all the backend infra, logins, logouts, chat messages, purchases.

We often had millions of players online at a given moment which means lots of transactions!