I guess enjoy it while it lasts? OpenAI won't be able to subsidize that forever either.

Agreed. I think the Chinese labs are proving that OpenAI and Anthropic don't have a moat in almost every aspect, especially pricing. I also think people are getting annoyed with the constant lift and shift. I've seen more folks drop Claude Code and Codex, specifically, because of the lock-in it provides the providers. I'm curious to see how people standardize on tooling adjacent and if Anthropic, Google or OAI move to block utilization akin to the games Anthropic has been playing as of late.

I think the end game is routed model usage and SLMs. I think Apple is going to prove this in the consumer space pretty handily and I'm curious how the Android ecosystem responds since the hardware is considerably lacking in model performance. I think Apple has a huge opportunity here, as much as I don't like their current ecosystem of walled garden. They did position themselves very well with ARM and custom chips for their hardware. Hopefully the broader ecosystem of ARM and Linux are able to make some headway and we see a more formalized, and broadly accepted, architecture to capitalize on.

is there an alternative to codex that “just works”? by just works i mean i can install as an app in 1 minute, and i get web search, skills, mcp servers, etc? Bonus points if it can control my chrome tabs like codex can, and if it offers remote control from my iPhone (chatgpt app) so i can kick off tasks while i’m out for a walk. Even more bonus points if i can, with 1 button click, share my chats or share the results of a session as a “site” (vercel style).

I’m sure you could put something similar together with a bunch of duct tape and 2 weeks of effort, but it won’t work nearly as nicely nor out of the box. so…what am i missing?

[flagged]

Big companies are not doing OpenRouter.

My company has an agreement with the big providers and while i'm pretty sure they think about how to get budget back, its an competitive advantage and normal people will not learn different model behaviours.

At least for now.

What lock in does codex have? I'm using it it pi harness specifically because it doesn't have much in the way of lock in.

I see exactly opposite . Chinese models fails under any complex scenarios, while us labs raise the price , that's a sign of confidence.

> while us labs raise the price , that's a sign of confidence

Regardless of what others are doing, US labs here are just rushing to IPO. It's NOT a sign of confidence.

It's the equivalent of saying you have confidence in SpaceX making revenue by renting out their data center (instead of their AI making bank).

going to IPO is a sign of confidence , you need to report a lot of things, that private companies don't. This is an exact reason chinese labs do not rush to go public. They wish to go , but money flow that is not as good.

On the same note. if spacex is doing datacenters on earth successfully what's wrong with that? They rented cloud infra to a #2 or #3 provider in the world after < 2 years in business. It's a success, no?

You’re not gonna get nuanced discussion on spacex or anything Elon related here these days. Most of this site is Reddit lite at this point including their milquetoast progressive opinions (Elon bad being one of them).

> if spacex is doing datacenters on earth successfully what's wrong with that? They rented cloud infra to a #2 or #3 provider in the world after < 2 years in business. It's a success, no?

If you get hired as a staff engineer and do the work of a junior, what's wrong with that?

Clearly xAI (now part of spaceX) did not raise funds to be a data center. The margins are way different. There are plenty of recent IPOs in that area that are worth at most billions not trillions.

> going to IPO is a sign of confidence , you need to report a lot of things, that private companies don't.

This isn't going to IPO. This is rushing to IPO. It is a sign of confidence that the market or wider environment might crash soon so we need the liquidity now.

> This is an exact reason chinese labs do not rush to go public.

Maybe or maybe not. If you are referring to Chinese labs - both the Hong Kong and China stock market are way weaker than Nasdaq. It's not comparable. Check all the recent Hong Kong IPOs that have tanked.

So no, reason not to might just be: no money in it.

running so much compute on the scale is not a junior task. weird analogy

I don't think anyone has a firm grasp on actual inference costs -- including the research and training that has gone into those models. We've got near-frontier capabilities from open source models from China at pennies on the dollar compared to US big tech rollouts. OpenAI and Anthropic are heavily subsidizing their inference -- no wait, they are charging the most they can get away with before going public. Where is the truth?

> I don't think anyone has a firm grasp on actual inference costs.

There are huge numbers of users (myself included) that do have an exact idea of what inference costs are - on open models. We can buy tokens from 3rd parties that have no motivation to subsidize our use. That's to say, there's a fair marketplace[1] and we're hanging out there.

If you want to say "I don't think anyone has a firm grasp on actual inference costs on these proprietary/closed models", then I could agree with that.

[1]: https://openrouter.ai/rankings#leaderboard

Both can be true. They can be charging what the market will bear, and still be charging less than their costs of running it.

There is no way I'm believing DeepSeek can charge less than $1 USD for their pro model while Opus costs over 25x more, yet their price is less than the cost of running it?

It would seem strange, if they were operating in the same economy, but they don't. DeepSeek operates in an economy with a high degree of central planning.

China subsidizes strategic industries, and they have heavily done so with AI. And DeepSeek specifically has said they have no commercialization plans.

For example: https://www.boc.cn/aboutboc/bi1/202501/t20250123_25254674.ht...

DeepSeek is not the only provider of inference for their models. Chinese subsidies likely do explain DeepSeek's ability to provide inference cheaper than other providers, but even a US provider like DeepInfra can serve DeepSeek 4 Pro at $1.30/M in and $2.60/M out. Unless American labs are doing something wildly inefficient, it feels safe to assume Anthropic has some profit margin on inference at API prices.

They may, neglecting overhead R&D. But also, some suspect that US models are significantly heavier than DeepSeek in resource consumption by multiple measures

It’s generally established that Anthropic/OpenAI are going for all out performance with big VC dollars at the expense of efficiency and China has geopolitically limited compute and an inventive to compete on value per dollar.

[deleted]

> There is no way I'm believing DeepSeek can charge less

Why not? Hetzner charges WAY less than AWS too. Can you not believe that?

That's the point. Hetzner is presumably covering their costs, so it's a safe bet that AWS is profitable.

> OpenAI and Anthropic are heavily subsidizing their inference -- no wait, they are charging the most they can get away with before going public. Where is the truth?

Both. They are charging the most they can get away with and that amount is still heavily subsidized by VC capital.

> I don't think anyone has a firm grasp on actual inference costs -- including the research and training that has gone into those models

We know roughly how much these companies spend and what their revenues are. Based on that, they'd have to more than double revenue (without spending more money) just to stay even, and that's not good enough given how deep in the hole they are.

> OpenAI and Anthropic are heavily subsidizing their inference -- no wait, they are charging the most they can get away with before going public. Where is the truth?

Both are true. I mean, I'd be willing to spend a bit more than I do now, but not more than double, and neither are most companies. The company I work for is currently investigating how to reduce LLM spend, not looking to spend more.

We pay by token at work. I just finished one session with Opus that was 4000 dollars. In about three days.

Now that 200USD subscription starts to feel cheap...

That would be about ~300 tok/s over 72 hours at Claude Fable output token prices? I'm not sure that this passes a sanity test.

Subagents are a helluva drug.

That's the price of several engineers!

Just outta curiosity, as I’ve never gotten a spend anywhere near that, what variant were you using? Like max context window and fast mode? Or was it just chugging along non stop for three days?

Fast mode max content window. The task was: replace all 1600+ queries from one database to another and make the whole integration test pass. We did multiple passes, with different concerns when changing from database to another. My OpenCode session right now says $4,365.02.

I haven't gotten close to this either before, but now we wanted to move fast because this branch gets conflicts all the time and we want to get over with the migration asap.

It's a bit of a left field question, but I am curious: Let's say that if the company wasn't paying the whole bill but only subsidizing it - e.g, if it paid 90% of the $4000. What would you do?

I don't know, why would I pay to do my job? It's not my first database switch for a startup. Only this time it doesn't take two months of grueling work. I know exactly how this is done, but the amount of grunt programming and testing and repetitive work is just not great. And it's not a task that brings new customers or a new product. Just a mandatory and annoying thing to deal with when we are growing.

And don't get me wrong. Opus did an absolutely horrible job at first, second and third round in this task. You really needed to steer it to get to the right solution.

And now Fable is out. And its first round of code reviews for this huge PR was definitely worth the money too...

Don't think that I'm just shrugging to that number. I see it every day, and I don't like that it's in the thousands now. But for people paying the 100 or 200 dollar plans, I'm not super sure if you will be able to use them in the future if the token price is in the thousands for a bit bigger task...

If I'd pay this from my own pocket, I'd definitely go with DeepSeek or local models and figure it out how to make the best use of them.

> If I'd pay this from my own pocket, I'd definitely go with DeepSeek or local models and figure it out how to make the best use of them.

IOW, you don't really think the value of this work is really worth $4k.

> why would I pay to do my job?

The question is: how long do you think that you employer will be willing to pay for you and Anthropic, if you yourself said if it were your money you'd put some time and effort to work with an open model?

> The question is: how long do you think that you employer will be willing to pay for you and Anthropic, if you yourself said if it were your money you'd put some time and effort to work with an open model?

I wonder what this question really means? Anthropic is useless if you don't know what to do with it. It's very useful if you do, and you can guide it to do the right things. Yes, it will for sure reduce the amount of people we need to hire. But we are always looking for hires who know what they do and can utilize agents to be faster.

But if you think about how long employer is willing to pay 10-20k per month per seat for Anthropic? I can't see this to be feasible and it will have to end at some point.

Regardless of the actual value produced by the models, if I am the CTO of any company that has the budget to spend $10k/month/seat on Claude, I'd take 5%-10% of that to build an alternative in-house.

I'm with you here. We can't slide into a situation where you put a sizable amount of your budget for an American mega corporation if you want to survive in the competition. We need local models and we need them to be good enough to help us.

Indefinitely for these big mundane grunk jobs. In every scenario it is going to be cheaper and faster than lobbing it to Infosys.

We have a firm grasp on actual inference costs from the various open weights model providers on OpenRouter. They don't have the money to subsidize inference and it's quite a competitive market, so the prices are representative of the costs.

[flagged]

regardless of whether that's true or not, US companies doing hosted inference of the models coming out of China are also significantly cheaper than those from OpenAI or Anthropic

Not relevant to the post.

I'm planning on switching from the $20/month to the $100/month plan.

It's worth it, and I can afford it, but I am not really the right type of user for token-based usage. It's all for personal and free work.

Just a personal anecdote but I have not hit any more thresholds or limits since switching to the MAX plan and so far, it's been worth it. But I do wonder how long even this will last...

I tried ultracode today on the max pro plan. An hour and a half in was all I lasted. Giant review on an entire six month old code base. It found 61 bugs, about ten were notable. Pretty impressed.

Ultracode destroys your limits and I have not found it to be worth it in the slightest, just fyi. I haven’t found any improvement over a local Claude code instance set to opus max.

I think subscription models are sustainable, but longer term, we should probably expect to see more prompt optimization happening in the providers inference pipeline. For example, unless you explicitly tell the agent or API to use a specific model, fronting the inference layer with a caching prompt classifier to determine which model to use, and automatically select the lowest cost model would probably already save alot of money (IDK if Claude/OpenAI do this on the backend, but several services I have worked on do some things like this to reduce costs of delivery customer facing inference at scale).

> fronting the inference layer with a caching prompt classifier to determine which model to use, and automatically select the lowest cost model would probably already save alot of money

Unfortunately, that doesn't work within a single session. The K-V cache of a model is intertwined with the model's configuration. Switching models invalidates the cache, meaning everything up to the point of the switchover is processed like a new, uncached input token.

Per Anthropic's pricing doc, an Opus 4.8 cache hit costs 50¢/MTok, while Haiku costs $1/MTok for uncached input.

Model selection works best if sessions are short and self-contained, particularly if the first few interactions can reliably classify the model need. That probably covers most 'support chatbot' use-cases, but it doesn't describe the kinds of heavy agentic automation that really chews through token budgets.

There is a definite financial incentive for people smarter than me to solve the problem, and I don't generally bet against businesses finding ways to reduce costs :)

> The K-V cache of a model is intertwined with the model's configuration.

I don't think this is true if you simply quantize the model or run it with fewer active experts? The underlying weights would stay the same. You could also play further tricks with skipping some of the model's middle layers outright, which works surprisingly well due to how skip connections are used.

ChatGPT does this and codex will eventually. They’ve stated it’s the future.

I have the $100 plan and had almost never run out of credits until I started using the ultracode / workstreams feature w/Opus 4.8..at which point I managed to blow the full 6 hour allocation in like 20 minutes, or so. In fairness, it did some amazing things with the extracted information, but it also strongly suggested that I'd need the $200 subscription *plus* a budget for extra usage.

Instead pay for 3 Chinese models. No max out ever then. I pay for kimi, DeepSeek and Claude. Whenever Claude decides it's over, I can safely continue on very cheap plans.

My bet is they'll keep subsidizing for a considerable period of time, at least 1-2 decades more.

Most AI companies are just testing the waters with paid tiers right now, their greatest fear with increased pricing is folks reverting back to wikipedia, stack-overflow and other public domain organic activity buzzing back to life; that will kill any RoI potential in LLMs forever. They're playing the wait game instead, observing how the digital sphere reacts to every little increase in price.

If that weren't the case, they'd be pricing at lucrative premiums already and even gotten away in short-term considering the increased dependency in the enterprise world. But that'd be like killing for the golden egg too soon and losing all long-term potential.

Once the folks are so addicted to LLMs that even writing a hello world program sounds like a nightmare and coming up with an article draft feels like reinventing Egyptian glyphs, that's when the real pricing hammer will come.

Anthropic and OpenAI won't be around in 1-2 decades if this is their long term plan. People are not going to revert, but go elsewhere. China is proving that it can be done cheaper.

1 decade = 10 years ...

Oh for sure. I've been hopping around from provider to provider for the last few years just depending on who has the most capable / subsidized plans at the moment. I definitely expect there will be a squeeze on subscription costs all around the industry post IPO.

A few weeks ago they massively cut usage on free tier.

Nothing is subsidized. Subscriptions are profitable for both Anthropic and OpenAI.

Anthropic wanting to switch billing to API rates is them just wanting to generate more profit.

> Nothing is subsidized. Subscriptions are profitable for both Anthropic and OpenAI.

Even if subscriptions are locally profitable (i. e., the cost of the subscription covers the cost of inference), they're still subsidized because they don't cover training and running the company; otherwise, these companies would be profitable.

I can see that being true, and it very likely is true. But isn't infinite VC money and no incentives to optimize operations the reason behind that?

Take a look at China for example - they have no access to NVIDIA, so they're trying to build their own hardware, they have no unlimited funding, so they try to optimize things.

And Anthropic is complete opposite of that - if NVIDIA were to triple their prices tomorrow, Anthropic would still pay them.

In the end, either we all somehow go mad and start paying Anthropic tens of thousands of dollars per month so support this madness, or we will go with whoever isn't lighting cash on fire.

> Take a look at China for example - they have no access to NVIDIA

Not true. Stop following US media spam if needed.

1. Very recently, the US did close a loophole on sanctions that allowed Chinese companies to use NVIDIA hardware outside of China i.e. before that was closed they all had access. The trick was train outside, do adjustments, ship the disks back and use non-NVIDIA in China, but at least the training and endpoints not hosted in China could all use NVIDIA.

2. There's been plenty of reports including fines and bans e.g. to Supermicro on smuggling NVIDIA hardware to China. I doubt it has been stopped. You can't catch everyone.

"Nothing is subsidized" is a wild take. They might be making money on some users, perhaps even most users, but certainly not all. Also, "subsidized" doesn't just mean on compute.

That's interesting. Do you have anything to back that claim up?

I do, and it's called DeepSeek's pricing table. At the same time, "subscriptions are subsidized" cohort have no data whatsoever, and yet they're in every thread.

Granted, it could still mean that Anthropic just chooses to lose money - but that's Anthropic's choice.

DeepSeek has proven that inference can be much, much cheaper than what Anthropic advertises on their API rates page.

> Granted, it could still mean that Anthropic just chooses to lose money -

Then the cost is being subsidized by investor capital, but it is still subsidized.

and soon by everyone who is invested into the NASDAQ, some sort of exit scam, but with a real product though

"Nothing is subsidized"

So they are profitable?

I think you are mismatching accounting terms.

You can't say the 'subscriptions' are profitable without accounting for the cost of making the model that is the source of the subscription.

They are heavily subsidized by the shareholders. Investing, running at a loss, with hope of some future profitability.

And yet, that is completely uninteresting to their user base.

If saner factory can sell you the same tool at a fraction of the cost of a gold plated factory, your choice is going to be obvious.