> pricing "Pro" $3.48 / 1M output tokens vs $4.40

I’d like somebody to explain to me how the endless comments of "bleeding edge labs are subsidizing the inference at an insane rate" make sense in light of a humongous model like v4 pro being $4 per 1M. I’d bet even the subscriptions are profitable, much less the API prices.

edit: $1.74/M input $3.48/M output on OpenRouter

Because you are comparing China to the US.

In China you need to appease state goals. In the US you need to appease investor goals.

China will keep funding them regardless of their income, because the goal is (ostensibly) a state AGI/ASI. In the US, the goal is an ROI which may or may not come with AGI/ASI.

They are different economies with different goals. We can look at past Chinese national projects and see that they are fine with burning $50 to get [social goal] that's worth $5.

This price is high even because of the current shortage of inference cards available to DeepSeek; they claimed in their press release that once the Ascend 950 computing cards are launched in the second half of the year, the price of the Pro version will drop significantly

In six month deepseek won't be sota anymore und usage will be wayyyy down.

A huge proportion of those scores are gamed anyways. Use whatever works for you at the price and availability you can afford

Only comparing on SOTA scores (ignoring price etc.) is like choosing your daily-driver by looking at who makes the fastest sports-car...

The constant improvements of SOTA are the main thing keeping the investment machine running. We can't really remove training costs from inference costs, because a bunch of the funding and loans for the inference hardware only exists because the promises the continuous training (tries to) provides.

Not really. SOTA vs non SOTA is "can I get my coding work actually done today" vs. "this can do customer support chat"

It is like car vs. kick scooter.

> "can I get my coding work actually done today" vs. "this can do customer support chat"

I think you need to define "can get coding work done" for this to make sense. Ive been using GPT-3 back-then for basic scripts, does that count ? Or only Claude-Code ?

I also think this is a false dichotomy, if you look at the Project Vend project or Vending-Bench, customer support etc. is at no means trivial. (Old but great story https://www.businessinsider.com/car-dealership-chevrolet-cha...)

It really isn't. We get coding work actually done today on Opus 4.5. That's not SOTA any more, and anything proximate to that level, even quite loosely, is genuinely useful.

OK we are in Opus 4.5 is not SOTA. Right by that definition .... yes you are right.

I mean its almost halve a year, i think that counts ?

Or there will be DSv4.1/2/3 ;)

Definitely something in this realm, they call the models "preview" at a bunch of different points in the paper.

What im really hoping is for a double-punch like with V3 -> R1

[dead]

Well, if they distilled once…

API prices may be profitable. Subscriptions may still be subsidized for power users. Free tiers almost certainly are. And frontier labs may be subsidizing overall business growth, training, product features, and peak capacity, even if a normal metered API call is profitable on marginal inference.

Research and training costs have to be amortized from somewhere; and labs are always training. I'm definitely keen for the financials when the two files for IPO though, it would be interesting to see; although I'm sure it won't be broken down much.

They are profitable to opex costs, but not capex costs with the current depreciation schedules, though those are now edging higher than expected.

Amazingly, the current depreciation overestimates the retained value of GPUs.

In 2023, the depreciation schedule for H100s was 2 years, but they are still oversubscribed and generating signficant income.

Coreweve has upped their depreciation for GPUs to 6 years(!) now, which seems more realistic.

https://www.silicondata.com/blog/h100-rental-price-over-time

They got loans to buy inference hardware on the promise of potential AGI, or at least something approaching ASI, all leading to stupid amounts of profit for those investors.

We therefore cannot just look at inference costs directly, training is part of the pitch. Without the promises of continuous improvement and chasing the elusive AGI, money for investments for inference evaporates.

Prices are not just hard cost of inference. Training costs are not equal. Chinese labs have cheaper access to large data centers. I also suspect they operate far more efficiently than orgs like openAI.

I was thinking the same. How can it be than other providers can offer third-party open source models with roughly the similar quality like this, Kimi K2.6 or GLM 5.1 for 10 times less the price? How can it be that GPT 5.5 is suddenly twice the price as GPT 5.4 while being faster? I don't believe that it's a bigger, more expensive model to run, it's just they're starting to raise up the prices because they can and their product is good (which is honest as long as they're transparent with it). Honestly the movement about subscription costing the company 20 times more than we're paying is just a PR movement to justify the price hike.

I'm pretty sure OpenAI and Anthropic are overpricing their token billed API usage mainly as an incentive to commit to get their subscriptions instead.

Anthropic recently dropped all inclusive use from new enterprise subscriptions, your seat sub gets you a seat with no usage. All usage is then charged at API rates. It’s like a worst of both worlds!

What's the point then? Special conditions for data retention/non-training policies?

SSO Tax is a large part of it, controls around plug-in marketplace, enforcement of config, observeability of spend. But it’s all pretty weak really for $20 a month.

And Microsoft are going the same route to moving Copilot Cowork over to a utilisation based billing model which is very unusual for their per seat products (I’m actually not sure I can ever remember that happening).

The target audience for the APIs is third party apps which are not compatible with the subscriptions.

True. I missed that.

My thoughts exactly. I also believe that subscription services are profitable, and the talk about subsidies is just a way to extract higher profit margins from the API prices businesses pay.

Google stated a while back, that with tpus they are able to sell at cost / with profit.

Aka: everyone who uses Nvidia isn't selling at cost, because Nvidia is so expensive.

And they actually say the prices will be "significantly" lower in second semester when Huawei 650 chips comes in.

Insert always has been meme.

But seriously, it just stems from the fact some people want AI to go away. If you set your conclusion first, you can very easily derive any premise. AI must go away -> AI must be a bad business -> AI must be losing money.

It is possible to question the sustainability of the AI buildout and not have a dogmatic position on AI development.

There are still major unanswered questions here. For instance, all of the incremental data capacity build out is going to businesses that have totally unknown LT unit economics and that today are burning obscene amounts of cash.

Before the AI bubble that will burst any time now, there was the AI winter that would magically arrive before the models got good enough to rival humans.

They’ve also announced Pro price will further drop 2H26 once they have more HUAWEI chips.

Point taken but there isnt any western providers there yet. Power is cheaper in china.

not soooo much though. It's heavily subsidized for residential consumption, but industrial power rates are almost comparable to the US (depends on the state you go to etc).

These models are open and there are tons of western providers offering it at comparable rates.

As this is a new arch with tons of optimisations, it'll take some time for inference engines to support it properly, and we'll see more 3rd party providers offer it. Once that settles we'll have a median price for an optimised 1.6T model, and can "guesstimate" from there what the big labs can reasonably serve for the same price. But yeah, it's been said for a while that big labs are ok on API costs. The only unknown is if subscriptions were profitable or not. They've all been reducing the limits lately it seems.

Is there evidence that frontier models at anthropic, openai or google or whatnot are not using comparable optimizations to draw down their coats and that their markup is just higher because they can?

It's because investors in OpenAI/Anthropic want to get their money back in 10 months, not in 10 years.

I haven't seen anyone claiming that API prices are subsidized.

At some point (from the very beginning till ~2025Q4) Claude Code's usage limit was so generous that you can get roughly $10~20 (API-price-equivalent) worth of usage out of a $20/mo Pro plan each day (2 * 5h window) - and for good reason, because LLM agentic coding is extremely token-heavy, people simply wouldn't return to Claude Code for the second time if provided usage wasn't generous or every prompt costs you $1. And then Codex started trying to poach Claude Code users by offering even greater limits and constantly resetting everyone's limit in recent months. The API price would have to be 30x operating cost to make this not a subsidy. That would be an extraordinary claim.

The claim that APIs are subsidized is very common.

eg:

Token prices are significantly subsidized and anyone that does any serious work with AI can tell you this.

https://news.ycombinator.com/item?id=47684887

(the claims don't make any sense, but they are widely held)

I’ll note that it’s common and dangerous, in that there’s a generation of engineers who are at risk of leading each-other astray as to the economics and therefore probability distribution of outcomes for some firms that will massively impact their careers.

I think I understand the major reasons for this meme, but I find it really worrying; there were lots of incorrect ‘it’s a bubble’ conversations here in 2012-2015, but I don’t think they had the pervasive nature and “obvious” conclusion that a whole generation of engineering talent should just, you know, leave.

Meanwhile I am hearing rational economic modeling from the companies selling inference; Jensen, (a polished promoter, I grant you) says it really well — token value is increasing radically, in that new models -> better quality, and therefore revenues and utilization are increasing, and therefore contrary to the popular financial and techbro modeling of 2023, things like A100s still cost quite a lot whether hourly or to purchase. (!) Basically the economic value is so strong that it has actually radically extended the life of hardware.

I just hate to imagine like half of the world’s (or US’s) engineering talent quitting, spending ten years afraid, or wrongly convinced of some ‘inevitable’ market outcome. Feels like it will be bad for people’s personal lives, and bad for progress simultaneously.

Yeah, subscriptions used to be extraordinarily generous. I miss those days, but the reinvigoration of open weight models is super exciting.

I'm still playing with the new Qwen3.6 35B and impressed, now DeepSeek v4 drops; with both base and instruction-tuned weights? There goes my weekend :P

I mean, not one "bleeding edge" lab has stated they are profitable. They don't publish financials aside from revenue. And in Anthropic's case, they fuck with pricing every week. Clearly something is wrong here.

you know, if you don't have to pay insane salary for your top engineers, and don't have to pay billions for internet shills to control the narrative, then all of the labs will be insane profitable.

It's the decades of performance doesn't matter SV/web culture. I'd be surprised if over 1% of OpenAI/Anthropic staff know how any non-toy computer system works.

> I’d like somebody to explain to me how the endless comments of "bleeding edge labs are subsidizing the inference at an insane rate" make sense in light of a humongous model like v4 pro being $4 per 1M. I’d bet even the subscriptions are profitable, much less the API prices.

One answer - Chinese Communist Party. They are being subsidized by the state.

[deleted]

When China does it it's communism. When companies in the west get massive tax cuts, rebates, incentives and subsidies, that's just supporting the captains of industry.