The narrative is that inference on existing models is profitable. All of the profits and many billions of additional capital invested go into training the next model, which is some multiple more expensive to train than the last. Each new model generation also leads to more revenue growth. Newer models are more compute-efficient when distilled (so could possibly be higher margin) but also they work on longer time-horizon tasks and can make greater use of test-time compute which increases token counts. So the inference ROI on each model can pay back the cost of training it, but future growth demands put all that money and more into training the next model. The numbers we’d need to prove whether this is true are not public, but it makes sense and fits what info we do have.
Theoretically, if training more expensive models stops resulting in better capabilities or isn’t economically viable, the labs can shift gears into making profit on old models. A lot of future growth is priced in so this would lead to a collapse in share price if it happens anytime soon.
There’s a story out that Anthropic might be profitable this quarter. This is in one sense bad news - it means that the company wasn’t aggressive enough about acquiring capacity last year, because they didn’t foresee how fast their inference business would grow. Anthropic is now forced to make suboptimal choices about serving existing users vs. training the next model (need to scrounge for capacity by paying other players like SpaceX). And as a Claude Code user I feel like I’ve been affected by that, what with the random outages and performance degradations.
Wait till people find out you can have the same or close to the same output at 1/100th of a price.
You cant possibly believe we'll be just spending more and more in tokens endlessly.
And if the margins are so good for anthropic they will collapse. There's too much competition in the field.
I don’t believe similar scores on small bounded tasks mean models are interchangeable. I’ve found that heavy token-burning workflows are good for my productivity (letting multiple sessions run async working of different stuff). Claude ultracode is an easy example to point to, but there are tons of harnesses out there doing similar things. I find using a higher quality model matters because it affects how far it can get unattended before heading the wrong direction. I’ve tried using the cheaper/faster models and it’s a real downgrade (or completely useless). A model that’s even smarter with longer time horizon would be even better for my productivity. I don’t think we are at the ceiling for model quality or price. My employer pays a lot for my tokens but it’s still a lot less than they pay me.
I agree Anthropic faces some risk they could get commoditized, but on the other hand if things go well they could end up leading adoption into more industries. There are upside and downside scenarios. Recursive self-improvement is obviously an important unknown and could lead to winner-take-all.
There's the "how much of my company exists in a black box controlled by some asshole" angle as well, but in my mind the biggest issue is that current models are already capable of saturating a dev in like four hours.