I don't think it's a secret that AI companies are losing a ton of money on subscription plans. Hence the stricter rate limits, new $200+ plans, push towards advertising etc. The real money is in per-token billing via the API (and large companies having enough AI FOMO that they blindly pay the enormous invoices every month).
They are not losing money on subscription plans. Inference is very cheap - just a few dollars per million tokens. What they’re trying to do is bundle R&D costs with inference so they can fund the training of the next generation of models.
Banning third-party tools has nothing to do with rate limits. They’re trying to position themselves as the Apple of AI companies -a walled garden. They may soon discover that screwing developers is not a good strategy.
They are not 10× better than Codex; on the contrary, in my opinion Codex produces much better code. Even Kimi K2.5 is a very capable model I find on par with Sonnet at least, very close to Opus. Forcing people to use ONLY a broken Claude Code UX with a subscription only ensures they loose advantage they had.
> "just a few dollars per million tokens"
Google AI Pro is like $15/month for practically unlimited Pro requests, each of which take million tokens of context (and then also perform thinking, free Google search for grounding, inline image generation if needed). This includes Gemini CLI, Gemini Code Assist (VS Code), the main chatbot, and a bunch of other vibe-coding projects which have their own rate limits or no rate limits at all.
It's crazy to think this is sustainable. It'll be like Xbox Game Pass - start at £5/month to hook people in and before you know it it's £20/month and has nowhere near as many games.
OpenAI only released ChatGPT 4 years ago but…
Google has made custom AI chips for 11 years — since 2015 — and inference costs them 2-5x less than it does for every other competitor.
The landmark paper that invented the techniques behind ChatGPT, Claude and modern AI was also published by Google scientists 9 years ago.
That’s probably how they can afford it.
I agree that the TPUs are one of the things that are underestimated (based on my personal reading of HN).
Google already has a huge competitive advantage because they have more data than anyone else, bundle Gemini in each android to siphon even more data, and the android platform. The TPUs truly make me believe there actually could be a sort of monopoly on LLMs in the end, even though there are so many good models with open weights, so little (technical) reasons to create software that only integrates with Gemini, etc.
Google will have a lion‘s share of inferring I believe. OpenAI and Claude will have a very hard time fighting this.
I can see it to be £18.95 from the UK, which is almost double that. I guess this is an oversight from your part or maybe quoting from memory.
I’m not familiar with the Claude Code subscription, but with Codex I’m able to use millions of tokens per day on the $200/mo plan. My rough estimate was that if I were API billing, it would cost about $50/day, or $1200/mo. So either the API has a 6x profit margin on inference, the subscription is a loss leader, or they just rely on most people not to go anywhere near the usage caps.
I use GLM lite subscription for personal use. It is advertised as 3x claude code pro (the 20$ one).
5h allowance is somewhere between 50M-100M tokens from what I can tell.
On 200$ claude code plan you should be burning hundreds of millions of token per day to make anthropic hurt.
IMHO subscription plans are totally banking on many users underusing them. Also LLM providers dont like to say exact numbers (how much you get , etc)
How's GLM treating you?
At the moment cannot complain.
For small personal projects it’s great value for money. Cheapest subscription was like 3$ during new years, token quota is acceptable to me (my guess it’s about 50-100M tokens per 5h)
Dunno how it would be with big projects, but with “personal project” things it feels to me that GLM-4.7 is 80-90% of Claude Opus 4.5. Just a tiny bit of more hand holding for GLM.
It's the latter. It's the average use that matters. Though I suspect API margins are also probably higher than people think.
Inference might be cheap, but I'm 100% sure Anthropic has been losing quite a lot of money with their subscription pricing with power users. I can literally see comparison between what my colleagues Claude cost when used with an API key vs when used with a personal subscription, and the delta is just massive
I wonder how many people have a subscription and don’t fully utilize it. That’s free money for them, too.
The trick is that the jump goes from 20 to 100 Dollar for the Pro to Max subscription. Pro is not enough for me, Max is too much. 60 would be ideal, but currently at 100 it's worth the cost.
But this is how every subscription works. Most people lose money on their gym subscription, but the convenience takes us.
What can bite them in this case though is alternate providers at the same price point that can bridge the gap. e.g. you currently get a lot more bang for your buck with the $20 OpenAI Codex subscription than you get for the $20 Claude Code subscription.
Of course they bundle R&D with inference pricing, how else could you the recoup that investment.
The interesting question is: In what scenario do you see any of the players as being able to stop spending ungodly amounts for R&D and hardware without losing out to the competitors?
In the scenario where that market collapses, ie when we stop making significant gains with new models. It might be a while, though, who knows.
> They are not losing money on subscription plans. Inference is very cheap - just a few dollars per million tokens. What they’re trying to do is bundle R&D costs with inference so they can fund the training of the next generation of models.
You've described every R&D company ever.
"Synthesizing drugs is cheap - just a few dollars per million pills. They're trying to bundle pharmaceutical research costs... etc."
There's plenty of legit criticisms of this business model and Anthropic, but pointing out that R&D companies sink money into research and then charge more than the marginal cost for the final product, isn't one of them.
I’m not saying charging above marginal cost to fund R&D is weird. That’s how every R&D company works.
My point was simpler: they’re almost certainly not losing money on subscriptions because of inference. Inference is relatively cheap. And of course the big cost is training and ongoing R&D.
The real issue is the market they’re in. They’re competing with companies like Kimi and DeepSeek that also spend heavily on R&D but release strong models openly. That means anyone can run inference and customers can use it without paying for bundled research costs.
Training frontier models takes months, costs billions, and the model is outdated in six months. I just don’t see how a closed, subscription-only model reliably covers that in the long run, especially if you’re tightening ecosystem access at the same time.
Yes, and my point is that thinking the cost of subscriptions is only inference, and not the research, is mistaken.
They can totally lose money on subscriptions despite the costs of inference, because research costs have to be counted too.
> Yes, and my point is that thinking the cost of subscriptions is only inference, and not the research, is mistaken.
Of course they are losing money when you factor in R&D. Everybody knows that. That is not what people mean when they say that they "lose money" on subscriptions.
> That is not what people mean
I don't really think that view is as widespread as you believe.
Didn't OpenAI spend like 10 billion on inference in 2025? Which is around the same as their total revenue?
Why do people keep saying inference is cheap if they're losing so much money from it?
When you have 800–900 million active users, no matter how cheap it is, your costs will be in the billions.
They paid about $10B on inference and had about $10B in revenue in 2025. The users and numbers of zeroes on those numbers are not relevant. What is relevant is the ratio of those numbers. They apparently are not even profitable on inference, wich is the cheap part of the whole business.
And cost of inference tripled from $3B in 2024 to $10B in 2025, so cost of revenue linearly grows with number of users, i.e. it does not get cheaper.
https://www.wheresyoured.at/oai_docs/
What walled garden man? There’s like four major API providers for Anthropic.
For example, OpenAI’s agent (Codex) is open source, and you can use any harness you want with your OpenAI subscription. Anthropic keeps its tooling closed source and forbids using third-party tooling with a Claude subscription.
Except all those GPUs running inference need to be replaced every 2 years.
Why?
They wear down being run at 100% all the time. Support slowly drops off, the architecture and even the rack format become deprecated.
GPUs do not wear down from being ran at 100%, unless they're pushed past their voltage limits, or gravely overheating.
You can buy a GPU that's been used to mine bitcoin for 5 years with zero downtime, and as long as it's been properly taken care of (or better, undervolted), that GPU functions the exact same as a 5 year old GPU in your PC. Probably even better.
GPUs are rated to do 100%, all the time. That's the point. Otherwise it'd be 115%.
Yeah that's not how it works in practice in a datacenter with the latest GPUs, they are basically perishable goods.
You don't run your gaming PC 24/7.
No, you're fundamentally wrong. There's the regular wear & tear of GPUs that all have varying levels of quality, you'll have blown capacitors (just as you do with any piece of hardware), but running in a datacenter does not damage them more. If anything, they're better taken care of and will last longer. However, since instead of having one 5090 in a computer somewhere, you have a million of them. A 1% failure rate quickly makes a big number. My example included mining bitcoin because, just like datacenters, they were running in massive farms of thousands of devices. We have the proof and the numbers, running at full load with proper cooling and no over voltage does not damage hardware.
The only reason they're "perishable" is because of the GPU arms race, where renewing them every 5 years is likely to be worth the investment for the gains you make in power efficiency.
Do you think Google has a pile of millions of older TPUs they threw out because they all failed, when chips are basically impossible to recycle ? No, they keep using them, they're serving your nanobanana prompts.
GPU bitcoin mining rigs had a high failure rate too. It was quite common to run at 80% power to keep them going longer. That's before taking into account that the more recent generations of GPUs seems to be a lot more fragile in general.
Mining rigs also used more milk cartons than datacenter racks; [hot/cold] aisles? No, piles! Not to mention the often questionable power delivery...
AI data centers are also incentivised to reduce costs as far as they can. They could absolutely be running them in questionable setups
Indeed, fair point. I'd hope the larger players would be better... but I know better
Yeah what's crazy is most of these companies are making accounting choices that obscure the true cost. By extending the stated useful life of their equipment, in some cases from 3 years to 6. Perfectly legal. And it has the effect of suppressing depreciation expenses and inflating reported earnings.
But don't they palpitate for thise sweet depreciation credits to decrease their tax on revenue?
Small sacrifice to not spook investors and the market.
"They're not losing money on subscriptions, it's just their revenue is smaller than their costs". Weird take.
It means the marginal cost to sell another subscription is lower than what they sell it for. I don't know if that's true, but it seems plausible.
The secret is there is no path on making that back.
My crude metaphor to explain to my family is gasoline has just been invented and we're all being lent Bentley's to get us addicted to driving everywhere. Eventually we won't be given free Bentley's, and someone is going to be holding the bag when the infinite money machine finally has a hiccup. The tech giants are hoping their gasoline is the one that we all crave when we're left depending on driving everywhere and the costs go soaring.
Why? Computers and anything computer related have historically been dropping in prices like crazy year after year (with only very occasional hiccups). What makes you think this will stop now?
Commodity hardware and software will continue to drop in price.
Enterprise products with sufficient market share and "stickiness", will not.
For historical precedent, see the commercial practices of Oracle, Microsoft, Vmware, Salesforce, at the height of their power.
> Commodity hardware and software will continue to drop in price.
The software is free (citation: Cuda, nvcc, llvm, olama/llama cpp, linux, etc)
The hardware is *not* getting cheaper (unless we're talking a 5+ year time) as most manufacturers are signaling the current shortages will continue ~24 months.
> The software is free (citation: Cuda, nvcc, llvm, olama/llama cpp, linux, etc)
If you factor in the cost of integration and ongoing maintenance - by humans or llms - it is not free. But it certainly has never been cheaper.
> The hardware is not getting cheaper (unless we're talking a 5+ year time)
Yes, that's the time I'm talking about.
You also had a blip with increasing hard disk prices when Thailand flooded a few years ago.
GB300 NVL72 is 50% more expensive than GB200 I've heard.
It has stopped. Demand is now rising faster than supply in memory, storage and GPUs.
We see vendors reducing memory in new smart phones in 2026 vs 2025 for example.
At least for the moment falling consumer tech hardware prices are over.
Memory and storage has always been very cyclical. This is nothing new
In the GP's analogy, the Bentley can be rented for $3/day, but if you want to purchase it outright, it will cost you $3,000,000.
Despite the high price, the Bentley factory is running 24/7 and still behind schedule due to orders placed by the rental-car company, who has nearly-infinite money.
On consumer side looking at a few past generations I question that. I would guess that we are nearing some sort of plateau there or already on it. There was inflation, but still not even considering RAM prices from last jump gains relative to cost were not that massive.
Please show me where any AI company is currently turning a profit with their current offering and price structure, then let's have that conversation.
Recent price trends for DRAM, SSDs, hard drives?
Short term squeeze, because building capacity takes time and real funding. The component manufacturers have been here before. Booms rarely last long enough to justify a build-out. If AI demand turns out to be sustained, the market will eventually adapt by building supply, and prices will drop. If AI demand turns out to be transient, demand will drop, and prices will drop.
Cars have also been dropping in price.
And knives apparently.
I recently encountered this randomly -- knives are apparently one of the few products that nearly every household has needed since antiquity, and they have changed fairly little since the bronze age, so they are used by economists as a benchmark that can span centuries.
Source: it was an aside in a random economics conversation with charGPT (grain of salt?).
There is no practical upshot here, but I thought it was cool.
Yeah I’d definitely take that knife thing with a grain of salt. I have most of a history degree, took a lot of Econ classes (before later going back for CS), and it’s a topic I’m very interested in and I’ve never heard that (and some digging didn’t find anything).
It’s also false that the technology has changed very little.
The jumps from bronze to iron to steel to modern steel and sometimes to stainless steel all result in vastly different products. Not to mention the advances in composite materials for handles.
Then you need to look at substitute goods and the what people actually used knives for.
A huge amount of the demand for knives evaporated thanks to societal changes and substitute goods like forks. A few hundred years ago the average person had a knife that was their primary eating utensil, a survival tool, and a self defense weapon. Knives like that exist today but they’re not something every household has or needs.
This is a good example of why learning from ChatGPT is dangerous. This is a story that sounds very plausible at first glance, but doesn’t make sense once you dig in.
Interesting. I am glad you commented. It's nice getting grounding from someone with a real background in the area.
With that said, if it is a hallucination (and it sounds like it was), it's one of the more interesting ones I have encountered. It almost has the shape of a good idea.
Blade and handle material has certainly changed over the years, but I think good arguments about how relevant that is could be made both ways. They remain handled cutting tools, used in the same general way, for the same general purposes (though as you posted out, some use cases have gone away). Basically anyone from any of these periods would recognize a knife from any other, and be able to pick it up and make immediate use of it for all their normal knife related purposes.
To be clear though, I am now siding with the clankers and arguing for a hallucination. It's an interesting thing to think about, but it sounds like it's not an established concept in any way shape or form.
Evidence for this claim?
I had a 1990 Ford Taurus as my first car. I had got it used and I remember it being completely impossible to afford a new car at the time.
It was sticker price of $33,000 adjusted for inflation:
https://en.wikipedia.org/wiki/Ford_Taurus_%28second_generati...
I don't think it would even feel safe to drive at all compared to what we have got use to with modern cars. It broke down 3 times while I had it and stranded me on the road. No cell phone of course to call anyone.
These were the mythic "good ol days".
A few generations ago almost nobody could afford a car, now many low income families afford two.
Maybe cars are not cheaper, just easier to finance due to the modern credit systems?
I like this analogy.
I also think we're, as ICs, being given Bentleys meanwhile they're trying to invent Waymos to put us all out of work.
Humans are the cost center in their world model.
[dead]
the path is by charging just a bit less than the salary of the engineers they are replacing.
After hearing this 10 times a day for the last 5 years I'm starting to get a bit tired. Do you have a rough time for when this great replacement is coming? 1 year? 2? 5? If it's longer than that can we shut up about it for a few years please.
It’s happening already? Ask any new CS grads about how good the job market is.
A poor economy that is still dealing with a decade+ of ZIRP, COVID shock, tariffs, and political strife; I don't see how AI has much, if anything, to do with this when compared with other options.
If AI was truly this productive they wouldn't be struggling so hard to sell their wares.
how do I understand what is the sustainable pricing?
Depends on how you do the accounting. Are you counting inference costs or are you amortizing next gen model dev costs. "Inference is profitable" is oft repeated and rarely challenged. Most subscription users are low intensity users after all.
I agree; unfortunately when I brought up that they're losing before I get jumped on demanding me to "prove it" and I guess pointing at their balance sheets isn't good enough.
The question I have: how much are they _also_ losing on per-token billing?
From what I understand, they make money per-token billing. Not enough for how much it costs to train, not accounting for marketing, subscription services, and research for new models, but if they are used, they lose less money.
Finance 101 tldr explanation: The contribution margin (= price per token -variable cost per token ) this is positive
Profit (= contribution margin x cuantity- fix cost)
Do they make enough to replace their GPUs in two years?
if 100% of the money they spend is in inference priced by tokens (they don't say about subscriptions so i asume they lost money), yes they make money, but their expenses are way higher than inference alone. so they can make the gpu cost if they sell tokens but in reality this isnt the case, becouse they have to constaly train new models, subscription marketing, R&D, And overhead. antropic in general lost way less money than their competitors i will take this number in particular the projected break even but googling say Gross margin in this case is how much money they do whit the GPU " Gross Margins: Projected to swing from negative 94% last year to as much as 50% this year, and 77% by 2028. Projected Break-even: The company expects to be cash flow positive by 2027 or 2028. "
i will not be as bullish to say they will no colapse (0 idear how much real debt and commitments they have, if after the bubble pop spending fall shraply, or a new deepseek moment) but this sound like good trajectory (all things considered) i heavily doubt the 380 billions in valuation
"this is how much is spendeed in developers between $659 billion and $737 billion. The United States is the largest driver of this spending, accounting for more than half of the global total ($368.5 billion in 2024)" so is like saying that a 2% of all salaries of developers in the world will be absorbed as profit whit the current 33.3 ratio, quite high giving the amount of risk of the company.
https://www.ismichaelburryright.com/
Is my goto reference for debt numbers etc.
Why do you think they're losing money on subscriptions?
Does a GPU doing inference server enough customers for long enough to bring in enough revenue to pay for a new replacement GPU in two years (and the power/running cost of the GPU + infrastructure). That's the question you need to be asking.
If the answer is not yes, then they are making money on inference. If the answer is no, the market is going to have a bad time.
Because they're not saying they are making a profit
That doesn’t mean that the subscription itself is losing money. The margin on the subscription could be fine, but by using that margin to R&D the next model, the org may still be intentionally unprofitable. It’s their investment/growth strategy, not an indictment of their pricing strategy.
They have investors that paid for training of these models too. It could be argued that R&D for the next generation is a separate issue, but they need to provide a return on the R&D in this generation to stay in business.
The return of R&D can just be an inflated valuation, there's no immediate need to make actual money.
But why does it matter which program you use to consume the tokens?
The sounds like a confession that claude code is somewhat wasteful at token use.
No, it's a confession they have no moat other than trying to hold onto the best model for a given use case.
I find that competitive edge unlikely to last meaningfully in the long term, but this is still a contrarian view.
More recently, people have started to wise up to the view that the value is in the application layer
https://www.iconiqcapital.com/growth/reports/2026-state-of-a...
But if more users use your service you get an advantage. Letting the users chose their tool for that would be a good thing.
Honestly I think I am already sold on AI, who is the first company that is going to show us all how much it really costs and start enshitification? First to market wins right?