A huge number of people are convinced that OpenAI and Anthropic are selling inference tokens at a loss despite the fact that there's no evidence this is true and a lot of evidence that it isn't. It's just become a meme uncritically regurgitated.

This sloppy Forbes article has polluted the epistemic environment because now theres a source to point to as "evidence."

So yes this post author's estimation isn't perfect but it is far more rigorous than the original Forbes article which doesn't appear to even understand the difference between Anthropic's API costs and its compute costs.

I'd love to be a fly on the wall when this argument is tried in front of a bankruptcy court. It drives me nuts. Of course there's evidence that they're selling tokens at a loss.

The only thing these companies sell are tokens. That's their entire output. OpenAI is trying to build an ad business but it must be quite small still relative to selling tokens because I've not yet seen a single ad on ChatGPT. It's not like these firms have a huge side business selling Claude-themed baseball caps.

That means the cost of "inference" is all their costs combined. You can't just arbitrarily slice out anything inconvenient and say that's not a part of the cost of generating tokens. The research and training needed to create the models, the salaries of the people who do that, the salaries of the people who build all the serving infrastructure, the loss leader hardcore users - all of it is a part of the cost of generating each token served.

Some people look at the very different prices for serving open weights models and say, see, inference in general is cheap. But those costs are distorted by companies trying to buy mindshare by giving models away for free, and of those, both the top labs keep claiming the Chinese are distilling them like crazy including using many tactics to evade blocks! So apparently the cost of a model like DeepSeek is still partly being subsidized by OpenAI and Anthropic against their will. The cost of those tokens is higher than what's being charged, it's just being shifted onto someone else's books. Nice whilst it lasts, but this situation has been seen many times in the past and eventually people get tired of having costs externalized onto them.

For as long as firms are losing money whilst only selling tokens, that means those tokens are selling at a loss. To not sell tokens at a loss the companies would have to be profitable.

Actually you can slice out a lot of things. It's even a GAAP metric, i.e. one of the common baseline that public companies are required to report, known as gross margin, literally just (revenue - cogs) / revenue. It is distinct from net margin, but both are useful and low gross vs net margin say very different things concerning the long-term prospects of the business.

The article is about compute cost though. By "lose money on inference" I mean the assertion that inference has negative gross margins which a lot of people truly believe. This is important because it's common to reason from this that LLM's are uneconomical and a ticking time bomb where prices will have to be jacked up several orders of magnitude just to cover the compute used for the tokens.

But there's no such thing as compute cost in the abstract. What exactly is compute cost for AI? Does it include:

• Inference used for training? Modern training pipelines aren't just gradient descent, there's a ton of inference used in them too.

• Gradient descent itself?

• The CPUs and disks storing and managing the datasets?

• The web servers?

• The people paid to swap out failed components at the dc?

Let's say you try and define it to mean the same as unit economics - what does it cost you to add an additional customer vs what they bring in. There's still no way to do this calculation. It's like trying to compute the unit economics of a software company. Sure, if you ignore all the R&D costs of building the software in the first place and all the R&D costs of staying competitive with new versions, then the unit economics look amazing, but there's still plenty of loss-making software startups in the world.

Unit economics are a useful heuristic for businesses where there aren't any meaningful base costs required to stay in the game because they let you think about setup costs separately. Manufacturing toys, private education, farming... lots of businesses where your costs are totally dominated by unit economics. AI isn't like that.

This is all true but it isn't really important for the argument people are making. What is more important is the marginal cost per token. If each token sold is at a marginal loss, their losses would scale with usage, that simply can't be happening with API pricing. But in general, yes I agree with you and I'm sure they are taking a huge loss on Claude Code.

It depends how we are looking at the business. Absolutely at the end of the day a company is profitable or not but when thinking about inference, which is largely a commodity these days, you would first think about the marginal cost of it. That is your corner stone of the business. We have pretty clear indication that largely API tokens are being sold above the marginal cost. For especially a brand new business that’s critical and something that many unicorns never even hit.

Your right that all other costs are critical to measuring the profitability of the business but for such a young industry that’s the unknown. Does training get cheaper do we hit a theoretical limit on training. Are there further optimizations to be had.

You don’t have large capex in an industrial and then in year one argue that the business is doomed when your selling the product above the marginal cost but you have not recouped costs yet that have been capitalized.

You're missing costs.

- Amortized training costs.

- SG&A.

- Capex depreciation.

All the above impact profitability over various time horizons and have to rolled into present and projected P&L and cash flow analysis.

We have amortized training cost estimates. Inference to training compute over model lifetime is 10:1 or over for major models at major providers.

In part due to base model reuse and all the tricks like distillation. But mainly, due to how much inference the big providers happen to sell.

So, not the massive economic loss you'd need to push models away from being profitable. Capex and R&D take the cake there.

> A huge number of people are convinced that OpenAI and Anthropic are selling inference tokens at a loss despite the fact that there's no evidence this is true

Theres quite a lot of evidence, no proof I'd agree, but then there's no absolute proof I'm aware to the contrary either, so I don't know where you're getting this from.

The two pieces of evidence I'm aware of is that 1) Anthropic doesn't want their subsidised plans being used outside of CC, which would imply that the money their making off it isn't enough, and 2) last time I checked, API spending is capped at $5000 a month

Like I say, neither of these are proof, you can come up with reasonable arguments against them, but once again the same could be said for evidence on the contrary

> which would imply that the money their making off it isn't enough

I don't think this logically follows. An unlimited buffet doesn't let you resell all of the food out the backdoor. At some level of usage any fixed price plan becomes unprofitable.

I agree the 5k cap is interesting as evidence although as you said I suspect there are other reasons for it.

As for evidence against it: The Information reported that OpenAI and Anthropic are 30%+ gross margins for the last few years. Sam Altman and Dario have both claimed inference is profitable in various scattered interviews. Other experts seem to generally agree too. A quick search found a tweet from former PyTorch team member Horace He: https://x.com/typedfemale/status/1961197802169798775 and a response to it in agreement from Anish Tondwalkar former researcher at OpenAI and Google Brain.

I get the other things, but believing Altmans's words is not high on the list of things to be considered evidence.

But a simple assumption that Anthropic runs a normal large MoE LLM (which it almost certainly does) suggests that the actual price of running it (mostly energy) is pretty small.

> A huge number of people are convinced that OpenAI and Anthropic are selling inference tokens at a loss despite the fact that there's no evidence this is true and a lot of evidence that it isn't.

I think it’s fairly obvious that Anthropic is lighting cash on fire and focusing on whether or not they’re losing money per token on inference is missing the forest for the trees.

Tokens become less valuable when the models aren’t continuously trained and we have zero idea what Anthropic is paying for training.

They are and they are convinced the cost is not truly baked in because you need to factor in all the training and R&D. It’s a mixture of folks that 1) are convinced AI is terrible, 2) hate Sam Altman and 3) don’t understand how business price products.

We don’t have clear evidence either way but it heavily leans to API pricing at least covering inference cost. Models these days have less and less differentiation and for API use there must be some thought to compete on cost but it’s not going to be winner take all. They leap frog each other with each new model.

Does this not count as evidence? I would agree that it sounds a little shaky, but I would not say there is no evidence.

https://www.wheresyoured.at/oai_docs/

[deleted]

I think the wafer scale compute is a massive deal. It's already being leveraged for models you can use right now and the reception on HN has been negligible. The entire model lives in SRAM. This is orders of magnitude faster than HBM/DRAM. I can't imagine they couldn't eventually break even using hardware like this in production.