Back in the 90s. I was part of a team working R&D project on flexible manufacturing. One of the central concepts was the use of buffers (also called decouplers) between manufacturing stations. These buffers were the space needed to place not finished products while they waited for the next station. Typically a conveyor belt. The main service is that both stations could run at slightly different rhythm, and they would not be causing problems on the other one. Either starving it or overwhelming it. Queues are pretty similar. You can have a consumer and a producer working and the queue in between prevents them from being blocked by each other. Thus I now see the main role of queues as decouplers.
Hum... The biggest hype on process engineering by the 00s was buffer size reduction. Because those buffers interfere with each other in chaotic ways, and they tend to turn "small problems that blow up soon, with small consequences" into "huge accumulated problem, that blows up hours after it appeared, with business-risking consequences".
Queues are pretty similar.
Reducing buffer size puts back pressure on the whole system, which can be valuable to manage load (but often throttles faster stages and that throttling makes people uncomfortable). A meaningful metric is how much of the buffer is used at any given time and the throughout. If the buffer is backed up, that says there's a bottle neck on the consumption side of the buffer and more bandwidth is needed there. For whatever reason, adjusting buffer sizes is the more common action taken. A buffer provides throughput management but it also provides info/metrics about the operation of the system.
You can also observe this in games like Dyson Sphere Program, (which is all workers and queues and buffers) where adding a buffer storage section of a belt only hides the fact that you are under-producing one of the components required.
The buffer smooths out bursty flow but you don't want that in the middle of the pipeline, as it actually represents mid-pipeline inefficiency. You should actually be fixing the upstream or downstream problem.
[1] or other automation games like Factorio, Mindustry
I'll note that speedrunners absolutely buffer mid-pipeline in Factorio, and not just for hand-crafting purpouses. Sometimes you're waiting for R&D, sometimes you're just running half the machines for twice as long, giving you the same output while saving on build costs. The actual bottlenecks are constantly shifting. "I'm not speedrunning!" you might say, but every regular game could've started as a speedrun that could've gotten you to where you are faster.
Understanding the tendency of mid-pipeline buffers to hide problems is useful, but scorning them entirely is also suboptimal.
That's a very good analogy. Queues are not there to solve overload (and they never were), they are there as an *architecture tool* that allows decoupling and *can* (not always) ease the scaling of the queue process (workers).
I think the back-pressure should always be implemented from the very beginning, as it also helps with defining the requirements of what the service should be able to handle
You're essentially describing what a silicon engineer would call independent "clock domains" (the stations) and "clock-domain-crossing signals" (the workpieces.) And, indeed, you would also tend to handle clock-domain-crossing signals by sticking an async FIFO between the two clock domains.
In manufacturing you have mass. Stuff has weight to it and sometimes I think it would be best to imagine data has mass.
In the widget factory there is the option to put stuff in the warehouse until you need it. Great in principle, but if you are the guy having to do the heavy lifting to get stuff crammed into the warehouse, and retrieved, then you can end up wondering why you are in the job, which promised so much more than spending all day in the warehouse rather than making stuff.
With web applications we will gladly get gigabytes of stuff from the other side of the world, just in case we need it. If all of that data weighed grams or even tonnes, then we would do things very differently, to be more like the Toyota Way, with just-in-time and the rest of it.
Hence my suggestion when building for the web, imagine every byte has mass. Design accordingly.