When discussing LLM pricing, people are missing the plot. The subscription token price is 10x-40x cheaper than API pricing. Your 90$ Claude subscriptions give you close to $1000 to $4000 in equivalent API token pricing.
The second issue is that the quality of the model “operator” makes a massive difference in the outcomes. Highly skilled senior devs who know how to prompt and have high agency will outperform team people that lack motivation and foundational skills.
Lastly, there is a massive difference in capabilities, determinism, and error handling between 5T SOTA models like Opus and tiny distillations from DeepSeek that perform well only in benchmarks.
I learned today that the Anthropic "Enterprise" plan - the one big companies use because they need governance features and audit logs and all of that jazz - is billed at API token rates (plus $20/seat/month).
So large companies are getting billed a lot more than those discount subscription plans.
> I learned today that the Anthropic "Enterprise" plan - the one big companies use because they need governance features and audit logs and all of that jazz - is billed at API token rates (plus $20/seat/month).
Can large enterprises just not use the API ? I have audit logs and what seem to be enterprise features through my anthropic account (platform.claude.ai)
Anything over 150 seats means you need to pay at token rates plus the $20/user. My day job is operational (no coding at all) and I'm spending ~$300 a month on a few chats with Claude/Cowork a day over the course of a month.
I hope your company is keeping the input/response pair in case they need to break free at some point.
Wouldn’t people mostly just want any artifacts?
$300 is my employer's monthly cap on Claude Enterprise. It lasts me at most a week of moderate use. I would much rather get Codex Pro and Claude Pro or Max, which would cost ≤ $200. For $300, one could also add Gemini Ultra to the mix so I could have all three review each other's code, etc.
Claude can be very good but enterprise pricing doesn't make sense to me.
The $200 plan you're talking about is subsidized by Anthropic. They cannot afford to keep offering that to everyone indefinitely. Absolute best case scenario for current users is that they can continue to subsidize it as way to sell enterprise plans, but there's no way that they can keep offering it to everyone at those prices.
Governance and audit trail are incredibly valuable to large enterprise organizations, especially those working in regulated spaces. Companies will pay a premium if the security/privacy/compliance issues are handled effectively.
Yep, where I work I know people easily spending over a few thousand dollars a month.
We are on it at my job. It saves money due to other parts of the org not using as many tokens.
The real cost effective way is giving a team $20 cursor $20-100 Claude $20-200 codex.
I'm spending 1k on Claude enterprise easily and that's with trying to spread it on codex and cursor using pi.
I've heard that the $20/seat gets waved if you have large enough committed spend.
Would they even care at that scale, if the average employee spends $3000 every month because mgmt mandates slopmaxxing?
> Lastly, there is a massive difference in capabilities, determinism, and error handling between 5T SOTA models like Opus
What's your source for Opus being a 5T model?
> and tiny distillations from DeepSeek that perform well only in benchmarks.
I don't think you know what you're talking about. Local models aren't “distillations from Deepseek”.
And they don't perform well “only in benchmarks”, Qwen 3.6 is a very decent model (obviously it's not Opus, but it's also much faster and speed is a quality of its own).
https://arxiv.org/abs/2604.24827
From this paper
That's not what the paper says though:
According to their estimation, Opus is likely between 1T and 15T, which really doesn't tell you much that you couldn't have guessed otherwise. It doesn't say “Opus is a 5T model”.The fact that there's absolutely no consistency in the predicted size between models from the same lab should tell you all you need about the predictive power of this method (and they aren't really lying about their numbers, their confidence interval is huge enough to fit anything in it, but their prose is making very strong claims out of their statistical nothingburger).
(somebody already posted this paper earlier, and I spent some time reading it, and this paper is really not that good even though there are a bunch of interesting ideas in it).
> What's your source for Opus being a 5T model?
Elon Musk tweeted that Grok is 0.5T or 1/10th the size of Opus. https://xcancel.com/elonmusk/status/2042123561666855235#m
While this source's reliability is certainly debatable, the size matches the results of this paper, in which researchers estimated the parameter count from model knowledge. https://01.me/research/ikp/
> While this source's reliability is certainly debatable
Massive understatement. Nowadays it has become hard to find a single Musk statement that doesn't contain at least one lie.
> the size matches the results of this paper, in which researchers estimated the parameter count from model knowledge. https://01.me/research/ikp/
Thanks for the pointer. This estimation has Grok 6 times bigger than Musk claims it is, so maybe that's where the lie is.
(I'm quite skeptical about that number though, it would be quite disappointing for the US tech if their flagship models had to be that much larger than the Chinese ones for such a small edge in performance. Because I don't think US labs are incompetent, I'd bet that US flagships aren't more than 2/3 times bigger than Chinese flagship. Otherwise it really doesn't bode well.)
In tiny gray text right above the table is written "90% PI ≈ ±3.00× either side." Is GPT-5.5-Pro 3.4T or 30.8T in size, or somewhere in between? We just don't know.
> What's your source for Opus being a 5T model?
Probably Elon Musk: https://eu.36kr.com/en/p/3760679047267075
Sigh, it's year 2026 and there are still people believing something Musk says…
People can simultaeneously be reprehensible idiots while being a reliable expert on something they have personally invested billions of dollars into and operate at scale.
> ...while being a reliable expert on something they have personally invested billions of dollars into and operate at scale.
Like "Full Self-Driving" from coast-to-coast by 2016?
He's also invested billions of dollars in SpaceX and Tesla... which he regularly makes wild claims about that are untrue.
I'm not saying he actually is an expert, but he could be an expert and still lie for any number of reasons.
Elon is a specialist of lying about stuff he invested billions in to make it look more valuable than it is (he's been doing that for Tesla for years). It's not a lack of expertise, it's the lack of any sense of integrity (and self respect).
He's lagging the AI race despite having tons of compute available, so he tries to make a narrative about how it's not that the model is behind, it's just smaller than the competition.
Its not like the non frontier are not improving. If someone can use deepseek to get 90% of the work done for $100 then pay another $100 to anthropic or openai to complete it I think they will rather do that than pay anthropic or openai for $1000.
> The subscription token price is 10x-40x cheaper than API pricing
This is a temporary phenomenon. Expect either drastic price increases or draconian throttling or both in the coming months.
These companies are operating at huge loses and have hundreds of billions in liabilities and commitments. They need to turn on the money faucet sooner than later.
Even with increased prices, AI enables velocity both in development and bugs fixing. Would companies want that? If prices are biting the company, I think companies will route all development and bugs fixing requests through few superperfomer developers with complete knowledge of the different components within the company (they will be the Queen Bees holding the company on their head). The rest of the company will be tasked with requirment gathering, specs cleaning, deambiguation and so on (worker bees).
So kinda like how stuff is now at a lot of big companies? I've worked at many different companies and almost always there are a few out-performers and a lot of people just found enough not to get fired (no hate, power to them lol).
We're already seeing slash their AI budgets. I expect that will increase till we hit more of an equilibrium.
From what I understand, that is sort of how IBM Bob works - multiple models behind the scenes and they route the request to the model that will handle it best at the lowest price.
Incentives matter…
If prices keep going up, watch for companies to exit frontier models and go to local llama.cpp instances for 6-month-ago SOTA, with the flex of being housed within the office - no more privacy leakage, no more price gouging.
To be honest, I’m not sure why a Y-Combinator backed company hasn’t come out yet flooding the market with highly capable OPAI (pronounced “Oh-pah” as in what Greeks shout as the drink shots), which stands for “On-Prem AI”
… yes, I just made up OPAI right now lol
> If prices keep going up, watch for companies to exit frontier models and go to local llama.cpp instances for 6-month-ago SOTA, with the flex of being housed within the office - no more privacy leakage, no more price gouging.
That or just hiring people to do the work! I hear rumours that this is already starting to happen in some places (perhaps those that were a little overzealous with AI-hype driven layoffs).
I do think many will move to lower cost models or self hosted over the next few years as prices balloon. And the privacy/control story is compelling.
If we're able to see some big increases in hardware capabilities that can be self-hosted, that will be an accelerant.
That said, most companies just want to pay a provider to delegate responsibility in exchange for cost and control.
> I’m not sure why a Y-Combinator backed company hasn’t come out yet flooding the market with highly capable OPAI
If we momentarily disregard the fact that YC itself owns billions of dollars worth of OpenAI shares[1], YC would plan to find demo-day investors willing to drive down the value of frontier labs. The coöpetition among VCs and the existing web of AI investments will mean no VC will be interested in investing in local AI...until after the frontier labs IPO.
1. Thanks to the self-dea^w foresight of former YC president Sam Altman
Theres recent reporting that Anthropic will be profitable this quarter...
edit: I see in other comments on this thread you think Ed Zitron is a reliable pundit so that explains everything.
How will it be profitable, really?
You can dismiss Ed (and me vicariously) but what's your compelling evidence to counter their extremely uphill battle towards profitability?
Either way it will be very interesting to see their S1 when they try and IPO.
If it's anything like SpaceX's then I suspect my post will age better than yours.
> When discussing LLM pricing, people are missing the plot. [ ... snipped ...] Your 90$ Claude subscriptions give you close to $1000 to $4000 in equivalent API token pricing.
And you think it is unreasonable to consider this unsustainable?
And the direction is definitely towards removing that subsidy really soon. We can see it with OpenAI's shift to API-equivalent pricing for enterprise customers last month. Anecdotally my company saw OpenAI credit usage grow 2x with stable use across the ChatGPT platform, which is pretty terrifying considering just 2% of the company uses Codex.
For context, ChatGPT business subscriptions give you a fixed pool of credits to use, after which you get billed a la carte at inflated 1.75x rates vs API, or if you don't want to pay, you get access to anything but the non-reasoning models turned off for the month.
We also tried Claude Enterprise, which was unusable as people blew through their monthly limits in a matter of hours.
Depends on what their actual costs are. Either they are losing lots of money on subscriptions, or they make absolute bank on API pricing.
Looking at the pricing of 1-2T models like Kimi or DeepSeek on the open market, I'm tempted to assume that inference costs are closer to subscription pricing than to API pricing.
Especially considering that subscriptions a) distribute load over time via rate limits, and b) will include a lot of users who get only a fraction of the possible value, whether they are on a personal account where they are on the rate limit on the weekend but barely use it during the week, or are corporate users who were issued an account they rarely use. Subscription prices are usually measured on the average case, not the most extreme value a power user can get out of it
> I'm tempted to assume that inference costs are closer to subscription pricing than to API pricing
So just going on vibes?
While some people don't like his content, Ed Zitron shows a lot of evidence for your assumption being very wrong.
These companies are bleeding cash at ungodly rates. It's likely their API pricing is still subsidized if you look at their overall financial picture.
Related, there's a good reason those API prices keep going up a lot every new version and it's not just because the models are better.
Selling inference for more than inference costs is not incompatible with bleeding cash at ungodly rates. They do in fact pay ungodly amounts of cash for other things, like training, marketing, etc. Heck, you can bleed cash while being profitable (in the accounting sense)
Also, API prices going up a lot every new version is more an OpenAI thing, and even there it's a recent trend: GPT 5.0 was a big price drop compared to 4.1, and 4.1 was cheaper than 4o, which itself got a price cut at some point and is cheaper than 4. Meanwhile Anthropic's API pricing stayed stable for many versions, then got slashed to a third with the 4.2 release and have stayed at that level since.
But explain to me how these companies will recoup these costs outside of increasing inference pricing?
Their business model is selling inference but the training and other costs have to be accounted for somehow. Unless I'm missing something obvious, inference costs must go up drastically if these companies are going to survive beyond the subsidy stage.
Sell more. The hope is that there is a huge addressable market that includes huge per-worker demand in almost all white collar work and lots of inference in people's private lives
If that doesn't work, then yes, then prices will have to go up
Considering not one company is in the black yet I don’t really know how we can say anyone is making bank, unless we want to count absurd levels of VC funding (now slowing down) I guess.
I am conveniently not counting training costs (since they add no marginal costs, selling more tokens doesn't impact them), and hardware and DC costs only amortized
Of course they do have to "make bank" in some way to offset the insane training costs. But whether they go for high prices or high volume, or offer some services as a loss leader to drive profits elsewhere is somewhat orthogonal to that
https://www.wsj.com/tech/ai/mind-blowing-growth-is-about-to-...
Does Anthropic really expect to double their income without also doubling their expenses?
Let’s see it first. And without omitting training/infrastructure costs at that. Until then my comment is still accurate.
its a private company, what exactly do you expect to 'see'?
Anthropic IPO's in less than 5 months and I guarantee you any company that officially is in the black will proudly shout it from the rooftops.
> Anthropic IPO's in less than 5 months
pure speculation. about as valuable as my linked wsj reporting i suppose. given thats the case, maybe you shouldnt claim so confidently that they are money incinerators.
“pure speculation” is a bit unfair.
Back to the point: No one is profitable yet, which I think we both agree is accurate. If you are going to lean on “they will be soon” then it’s fair to say they’re going to IPO soon.
Ease off the gas. We’re just discussing a tech company.
Also, your local hardware is in no way capable of running the types of models that the cloud providers do, it’s just not economically feasible, and it never will be.
Very much dependent on the situation. For many business tasks, local hardware is good enough. But what a lot of folks overlook when saying these things is that (a) workers do more than run AI models on a piece of hardware, (b) significant computer hardware is already sitting idle outside normal work hours, when it can be running batch jobs, and (c) employees can share local hardware.
Depends on what you mean by "economically feasible".
Even very cheap mini-PCs and laptops can run any of the models run by cloud providers, albeit at a much lower speed (i.e. with the weights stored on SSDs).
Whether such a low speed is useful, depends on the application. For something like a coding assistant or bug scanning, an instant response is desirable, but certainly not necessary.
The SSD would wear out in days while the laptop generates two responses a day. This is like saying you could power your home with AA batteries, yes technically you could but in practice entirely infeasible.
Weights are write-once data.
It can run open-weight models that are roughly as capable. It's going to be slow unless you're using actual datacenter hardware, but they'll run.
"roughly" is doing a lot of heavy lifting there
The difference between datacenter hardware and cheap personal hardware is not in what can be run and what cannot be run.
Anything can also be run on a cheap computer.
The difference is in speed. A cheap computer may run a big model up to a few orders of magnitude slower than datacenter hardware, depending on whether the LLM is small enough to fit in GPU memory, or it is small enough to fit in CPU memory or it is so big that it must spill on SSDs.
Depending on the application, the tradeoff between run time and run cost may happen to favor using local hardware, despite a much slower speed.
There are plenty of applications where doing them for negligible cost during an overnight job can be preferable to obtaining faster results at a very high price, for instance scanning for bugs in a mature code base using a great number of different open-weights LLMs, which can achieve similar bug coverage like using a single, but overpriced and unavailable SOTA LLM, e.g. Mythos.
NEVER will be is a pretty big leap. Never is a long time.
> it never will be.
Giving strong “640k is enough for anyone” vibes here.
Isn't the plot that it's like an infinite bikeshed but 10% of the biksheds are actually trailer parks and when you finally realize it's a trailer park and not a bike shed you're down 10-100$ because it's token gen is faster than you can actually validate?
Some might say the price wouldn't be great if you could actually process and validate it...
> The quality of the model “operator” makes a massive difference in the outcomes.
My hunch is that this is the source of much of the variability in outcomes upstream of HN commenters claiming extremes of, "This model changes everything!" to "This[same] model is crap."
We haven't operationalized what it means to "be good at prompting," nor developed proxies/heuristics/shibboleths for accessing prompting skill. There's community skepticism over whether prompting skill even exists. Besides even if prompting skill is real, who wants to hear, "Actually you kinda suck at prompting."
It's 100% this. Many people suck at prompting. It's likely that habits from search are ingrained. But in general some people are just so bad at it .
Prompting is just writing specification documents. A lot of people are very bad at this. I suppose that more to the point, a lot of people are just bad at writing.
According to Google, “there’s no wrong way to prompt”.
https://www.youtube.com/watch?v=9bBfYX8X5aU&t=48s