Now I'm more confused. An infinitely efficient system would saturate the network. An infinitely inefficient system would saturate the CPU. " The implementation saturates CPU before reaching I/O limits." is true infinitely inefficient system, but false for an infinitely efficient system. That means it's an undesirable.

The metric that actually matters is efficiency of the task, given a hardware constraint. In this context, that's entirely network throughput (streaming ability/hardware, with hardware being constant, you can just compare streaming ability directly).

For a litmus test of the concept, if you rewrote this in C or Rust, would the CPU bottleneck earlier or later? Would the network throughput be closer or further from its bottleneck?

You're right - this represents computational duress, not optimal efficiency. The 1 CPU struggles to handle the 50 concurrent user scenario and was chosen to demonstrate worst-case behavior rather than peak performance. I intended to stress test the framework. I did not mean to indicate that CPU saturation is ideal but rather highlight that performance remained predictable even at the limits.

Lower-level languages would certainly offer higher performance. I was hoping to showcase how Python can perform when architecture is restrained. The goal was to show that careful design choices (bounded memory, generator-based streaming) can maintain predictable behavior even when computational resources are exhausted.