How do the economics of your statement work out? Clearly inference providers don't have a time to ROI of 10 years on their hardware costs; and that's without even taking ongoing energy costs into account. What's missing here?

Output tokens are actually kinda expensive for the provider.

The input cache hit tokens are incredibly cheap for them, (incredibly high margin too, except for deepseek).

And input tokens are in the middle. Input tokens can be processed very efficiently.

Also his math is wrong. $100k gets you 22.7B output tokens at $4.4/M which is how much GLM 5.2 costs.

At 500/s 22.7B is just 500 days. Or about 1.54 years. Which is much less then the life of the hardware.

The inference providers are running batch sizes much larger than 10

Inference providers have been getting a firehose of investor cash to keep the chips running (and are looking around very nervously as that firehose starts to sputter).