Hacker News

Ldorigo 20 hours ago [ - ]

How do the economics of your statement work out? Clearly inference providers don't have a time to ROI of 10 years on their hardware costs; and that's without even taking ongoing energy costs into account. What's missing here?

kingstnap 14 hours ago [ - ]

Output tokens are actually kinda expensive for the provider.

The input cache hit tokens are incredibly cheap for them, (incredibly high margin too, except for deepseek).

And input tokens are in the middle. Input tokens can be processed very efficiently.

Also his math is wrong. $100k gets you 22.7B output tokens at $4.4/M which is how much GLM 5.2 costs.

At 500/s 22.7B is just 500 days. Or about 1.54 years. Which is much less then the life of the hardware.

ac29 18 hours ago [ - ]

The inference providers are running batch sizes much larger than 10

bandrami 9 hours ago [ - ]

Inference providers have been getting a firehose of investor cash to keep the chips running (and are looking around very nervously as that firehose starts to sputter).

dakolli 15 hours ago [ - ]

https://aimultiple.com/gpu-benchmark

concurrency