the article offers a simplified world model: Poisson arrivals and infinite queue, which is fine as a math model.
In the real world however, the bursts can be correlated, due to factors like timeouts/retries, thundering herd, correlated bursts.
so the real economics of load-balanced system is a simple reliability story: being able to reasonably serve the peak traffic, which leads to over-provisioning of those systems.
using cloud allows some form of scale up/down of resources, but doesn't completely solve the problem. I think the migration away from synchronyous systems towards async systems and letting clients gradually absorb the delays is a better approach (rather than forcing infrastructure to be dynamically scaled up/down and be billed per request-second by your cloud provider)
I'll argue though there is a political and ideological dimension to queuing theory, especially when it comes to systems where humans are handling the work.
In the idealized case the comfortable place to operate a system is with around 2/3 utilization, like around there latency (customer experience, employee experience) is reasonable, slack is reasonable, etc.
A manager who is being managed by a manager who is being managed by a manager who is being managed (...) is going to see 0.99 utilization and want that last 0.01 and be oblivious to the fact that the math says the system is already past the breaking point, customers are furious, employees are worn out. Any slack at at all seems like an affront.
[dead]
Another technique to add to the mix if you can handle the additional complexity is to load or feature shed. If you can delay or just drop additional expensive application features during the exact time you need to scale or handle a burst, then your system has additional core app logic to handle requests. This can prevent the system getting wedged in a positive feedback loop.
See also the gamedev technique of having sacrificial assets or code, so when you need to free up space late in the schedule to ship, you have something you can actually shed.
Having something intentionally non-essential to cut is much better than discovering under pressure that everything is load-bearing
It’s also an old devops trick to put some GB files on the FS of critical systems (databases, ..) when there are no way to dynamically add more space/volumes and monitoring is not trusted.
Bit hard to explain though.
I think sacrificial ballast would be a good term, https://en.wikipedia.org/wiki/Ballast
As in the classic paper "Wide-area traffic: The failure of Poisson modeling" by Vern Paxson (author of GNU flex) and Sally Floyd (legend in the world of TCP/IP congestion control):
https://www.osti.gov/servlets/purl/10107457
> the bursts can be correlated
Hawkes processes are what other fields use to model this
>In the real world however, the bursts can be correlated
Very true, as application-layer load-balancing often explicitly pre-bakes the traffic schedule to several hundred distributed IPs for data locality. Essentially bypassing the functional need for DNS and local round-robin traffic balancers.
One trades concurrent bandwidth for slightly higher latency, and dynamically adapted capacity as traffic load changes. =3
If your clients are all this well behaved, then you’re definitely not exposed to the public internet.
The global edge networks that I’m aware of all use L4 LBs and L7 LBs. Cloudflare picks anycast over DNS LB, but DNS LB is still widely used.
I don’t see these things changing.
> I don’t see these things changing.
Time Division Multiplexing is usually already used on cellular and Wifi wireless protocols. It only requires slight modification to turn it into an effective network traffic balancer to avoid the naive "everyone update on Tuesday 6am UTC", or "It is Christmas morning and game registration is open".
Notably, it also allows tracking specific accounts by encoding disjoint ingress host lists (siloed concurrent user groups with client certs and firewall whitelist rules.) And users do not have global network knowledge as hosts are cycled into temporary stewardship under load. Thus, only the coordinators for one-time new-user registration operates on classical DNS/round-robin host services.
With DNS, by expected function everyone knows the global published ingress points within minutes. Under a DoS the traffic just hammers down, and small firms usually just pay for the Cloudflare like services.
For systems I've known, TDM reduced peak resource capacity costs down by around 37x. Generally speaking, a 100 user group having fun will not share their server details/invites with folks that exhibit lag-switching or other network shenanigans.
But you are correct, in that it doesn't help if PIBKAC. =3
We're definitely talking about different things. The open internet is hostile and you'll always need load shedding when the clients misbehave.
If you control all clients and servers and are on a closed network, you can do all sorts of fun things... though load shedding is helpful for when your good clients turn bad due to a code bug. A self DoS is the worst kind of outage.
[dead]