The problem with this kind of theoretical analysis is that most load balancers don't work this way, especially the typical "cloud" HTTP or TCP load balancers, which are stateless and avoid this kind of central queuing logic like the plague because it doesn't scale to their levels.

For example, most cloud load balancers I've worked with are stateless, non-queuing, and allocate work to back-ends strictly randomly.

Traditional non-cloud load balancers can implement this kind of perfect queuing, but these settings are generally off by default even when available.

- NetScaler: surgeProtection + maxClient=1

- F5 BIG-IP LTM: request queuing + pool/member connectionLimit=1

- HAProxy: server maxconn 1 + timeout queue

- NGINX Plus: server max_conns=1 + queue

Envoy, Apache, and Traefik have partial or limited support.

Conversely, most multi-threaded web server frameworks already do this by default! For example, ASP.NET has essentially an internal "load balancer" with a perfect queue if you pretend each core is a "node" and the whole server is the "scale out system".

> can only handle a single concurrent request, and has no internal queuing

And the systems that load balancers front almost never behave this way...

Dont even get me started on client performance here as well, latency, speed, caching -- that can all be impacted by payload size.

The article is interesting, but it is an ideal that almost never turns up in the real world.