Here is a trend I'm noticing:

- GPT-5 mini costs $0.25/$2 and will be discontinued in December.

- GPT-5.4 mini costs $0.75/$4.5 and is supposed to be the replacement.

- GPT-5.4 nano costs $0.2/$1.25 and, while it ranks better in benchmarks than GPT-5 mini, it's not even close when you test it in real scenarios.

So you're left being forced to go to GPT 5.4 mini if you use 5 mini today.

The same thing is happening here as their “Luna“ model will cost $1/$6.

Can't we just stay with the models we actually want? I don't need GPT 5.4 mini. GPT-5 does the job.

Maybe it’s the realization that it was never that cheap in the first place and they're forcing us to upgrade in a slow and painful way.

If you have no need for Anthropic/OpenAI's frontier model capability, you may be better served with an open-weight model that can't be taken away.

Edit:

> GPT-5 does the job.

I bring up DeepSeek V4 Flash a lot on HN, but I want to mention that according to Artificial Analysis, it trades blows with GPT-5 (high) (from August, 2025) [0]

[0]: https://artificialanalysis.ai/models/comparisons/deepseek-v4...

We rolled out Deepseek V4 Flash to our customers and it was an absolute disaster, unfortunately. It was not able to follow simple commands, always "forgot" to do things, lied consistently about its work, and so on. It was pretty good though on on-off work, like summarizing something or executing simple commands, so we are experimenting now with using it for subagent work with clear instructions and hand off.

Deepseek V4 Pro on the other hand is a really really good main driver and we have a lot of success using it. Its not Opus or GPT-5.5 level but on its way. Kimi 2.6 as well btw.. so there is already quite some choice.

Your experience with DeepSeek v4 Flash differs from mine: while I usually use DeepSeek v4 Pro (that is also inexpensive), I find using DeepSeek v4 Flash with the Fireworks.ai API and properly configured OpenCode to be very good for routine work, and it is pleasantly very fast. Admittedly I use DeepSeek v4 Pro for difficult problems.

I encourage people to at least once a month to do a quick evaluation with their own problems and workflows. Estimate cost as both what inference tokens cost for a task and also how much human effort it takes to get required results.

I disregard benchmarks.

We are also using fireworks as our model provider. Our harness is openClaw, so tasks are not only coding but all kinds of tasks. For instance, I asked to fetch some info from the web via Chrome browser and to collect the info in an MD. The MD never appeared, even though it claimed to. I asked it three times to write the MD and it was always: “oh yes, I do it now..” then nothing. The search itself also was very bad because it just gave up after one page and hallucinated an answer and - even worse :-) - told me it was very thorough…

Pro aced the task :-)

But maybe its a config issue.

Have to ask: did you try 'xhigh' thinking effort with Flash? I also found it nearly unusable on just 'high', but on 'xhigh' it's nearly equivalent to Pro's 'high'.

That sounds correct: Pro for longer agentic tasks, Flash is fine for writing short programs, finding things for me in a large code base, etc.

I found Flash to be a bit shaky as well until I started using it in xhigh/max thinking effort, then it became my daily driver. It runs quite well on a couple of DGX Sparks.

I still wish it was a little better, but there's hope for another model checkpoint (maybe with some of GLM 5.2's goodness distilled into it, that would be nice).

> I found Flash to be a bit shaky as well until I started using it in xhigh/max thinking effort

This is true for most of the open weight chinese models, to be fair. They're really built around long reasoning chains.

Also you're making me want a second Spark-alike :') but they're so expensive...

DeepSeek V4 Pro is only ~3-4x as expensive as Flash. It won't replace GPT-5.5 (nowhere near) but I've been using the $20 sub to punch through tough cases and use Pro for rest.

deepseek has no part of their privacy policy on their API about training. They are 100% training on every single word you give it.

If your customers are fine with that, your IP is not interesting, then you can use it.

Though with open models you have a lot of choice where to get it from. I see like ~15 providers here with various logging/ZDR policies, so pick whatever mix of price to features you want:

https://openrouter.ai/deepseek/deepseek-v4-flash

I don't believe a single word from AI companies, no matter where they are from. Sourcing their training data is run like genuine criminal enterprises - last year Anthropic settled for 1.5 billion, and and if they settled so quickly it might mean what we would see in court is even worse.

You can use deepseek through opencode, which says its providers have a no-retention policy.

You don’t have to access Deepseek through Deepseek. You can self-host it and your data never leaves your premises.

I self-host Flash actually, but yeah.

When I use their API I use it knowing that they probably train on the data, and knowing that it's probably used to improve future iterations of their models.

But I use their API extremely rarely lately, because local Flash is good enough for me the vast majority of the time

And you’ve opened wireshark and verified the model is sending absolutely nothing? Not caching and sending later, etc?

If you self host then you can audit the open-source llama.cpp or whichever other program you are using for inference, to see exactly what it does, and also whichever open-source harness you use for implementing a coding assistant or other agentic workflow.

The model consists of a bunch of data files, it does absolutely nothing by itself.

If you run inference on your own hardware, you have absolute control on how the LLM is used, not like when you use an external service provider.

Not sure if you mean something else, but the model itself is not able to send anything.

It’s my daily driver in opencode

Unless you are hosting it yourself on your own infrastructure it absolutely can be taken away.

For all intents and purposes you'll be able to move an open weight model wherever you want.

I really dislike this rhetoric, you sound like the FSF guys who are like "you're not free until you're running coreboot with zero binary blobs". Sure they have a point but also, most people are fine running regular linux.

Reading your comment made me realize that I love that the position of the FSF is held by someone, in the interest of stretching the Overton Window to that side.

Very much with you on that. It’s not a position I personally hold by any means, but I appreciate its existence connected to a prominent long-standing organization.

Most FSF guys actually have very nuanced views on the topic and you’re doing everyone a disservice by reducing it to an extremist sound bite.

That's literally the official FSF position.

https://www.fsf.org/resources/hw

> For example: the Free Software Foundation only purchases desktop machines which support Libreboot, and Thinkpad X200 and X60 laptops with Libreboot. All desktops and servers we buy are KGPE-D16 motherboards, which are supported by Libreboot. As a result, all of the workstations used by the FSF staff have a free BIOS.

https://www.gnu.org/distros/common-distros.html

> Except where noted, all of the distributions listed on this page fail to follow the guidelines in at least two important ways:

> ...The kernel that they distribute (in most cases, Linux) includes “blobs”: pieces of object code distributed without source, usually firmware to run some device.

They are extreme, uncompromising, and live by their principles.

They are also the reason you can buy a computer meeting those requirements instead of being a pipe dream.

Damn, that's awesome. I suddenly feel like replicating their setup and seeing how it goes.

For even more of a challenge, try replicating Richard Stallman's personal setup:

https://stallman.org/stallman-computing.html

> They are also the reason you can buy a computer meeting those requirements

The latest libreboot-compatible laptop I could find, at https://libreboot.org/docs/install/t480.html, is from 2018 -- not sure if that would still be available?

Thankfully he didn't say that they're all like that. Instead he pointed out the few that are as a well known example of similar behavior.

If you reread the comment with a fresh mind you'll notice that you misunderstood what he wrote

When attacking archetypes of people, there is some responsibility to make clear who you’re attacking and why, even to someone who’s not being hyper-open-minded. At least if you want them to learn from you: which may or may not be your goal. When you attack/signal you’re on the offensive, it is foolish to believe that they won’t knee-jerk attack back and become closed minded at least a little.

Regardless, the “misinterpretation” of the parent comment is actually a plausible interpretation. I suspend my judgement on what the actual “correct” interpretation of the original comment is: there are too many plausible interpretations to deductively decide. But I do know that since they first comment brought up a contentious issue, they should have put more work into crafting their message so there aren’t so many plausible interpretations that are contradictory. Or alternatively, they should have specified more precisely who they were talking about without a shadow of a doubt. That is if the commenter cared to be properly interpreted, but that may not be their goal. There are many reasonable reasons why that wouldn’t be their goal.

You used a lot of words to defend a strawman argument

As you ironically strawman me. Your hypocrisy knows no bounds!

When you read someone's comment there is some responsibility to read the words they wrote and not attempt to attack them for an argument no reasonable person would extract from those words.

Reasonable people could interpret the original comment in many other ways than was probably intended.

I like when people are open minded to people who are closed minded/attacking them. It’s an admirable and difficult trait to attain. But to expect that from others is foolish. Most people can’t stay objective/curious after being punched in the face.

[deleted]

Angry girlfriend SMS essay

You have a lot of angry girls texting you?

It is the FSF itself who has these extremist views.

Unless the US Gov bans inference companies from serving Chinese models to US customers...

good luck doing it to inference companies in singapore or the netherlands. or one of the decentralized networks that dont look useful right now. the world is already sick of america acting like it can do whatever and force their rules on the rest of us.

Still, with the same model being served by multiple providers, it is much less likely to disappear entirely, even if you would like to keep using a cloud provider. Worst-case scenario, you change providers. Or you use OpenRouter as a proxy.

There is actual market competition to host open models. If one provider stops offering a model you likely can find another provider that will

[deleted]

But you have multiple providers, not just one.

And every single one of those providers would buckle under government pressure.

Fable itself is hosted on all major cloud providers. How many offer it today?

This seems a little fanciful.

There's really no comparison between a model that Anthropic allows Google and Amazon to host with one that has been downloaded hundreds of thousands of times and has dozens of public inference providers.

I don't think they "allow" Google or Amazon to host them so much as Anthropic itself is deploying and managing their services on multiple cloud providers just like every other global scale business. Even the models served via OpenRouter are just being routed to compute under Anthropic control. Same with OpenAI. They aren't going to hand the world's most valuable intellectual property at the moment to some third party to run independently.

Now for the Chinese models on OpenRouter, yea. Those providers could be legit. Or it could be a failed crypto mining operation pivoting to providing AI compute. Who knows.

The providers on OpenRouter are not all in the US.

That doesn’t mean they are immune to US laws. If they want to continue to operate in the largest market in the world they will fall in line.

And if you are a legit American business you aren’t going to illegally bypass import/export controls.

More importantly, the download is out there. You can download it yourself today, and if it's that important to you, you can buy the hardware too.

I'm sure he's referring to the tightening of internet controls around social media as an extrapolation to controlling websites, etc.

Even in that case it can't be taken away; GPT and Claude are banned in China yet there's still a huge black market for tokens.

No. As long as you downloaded the weights, you can run them somewhere.

No it can't you can take it where ever you want. It is yours not theirs.

>Unless you're running Linux yourself, it can absolutely be taken away.

Yes. The difference is obviously that full, fat Linux runs on a superset of anything a layperson would call a computer, and can be built from source on roughly the same set of hardware. Running the full, fat Deepseek (as in the 1.6T model, unquantized) is too big to run on anything a layperson would call a computer, and being able to actually build it is even harder.

It's famously difficult to find people willing to rent you time on big computers over the internet.

Other people's computers famously can't be taken away.

You're right, there is an all powerful wizard who can take away all the world's computers. You got me.

There's a reason Yudkowsky only thinks AI can be stopped by literal missile strikes against data centers.

[deleted]

Popular open models on Openrouter have dozens of providers.

I just want personal agency

Deepseek V4 flash is actually useless. Sorry I've tested it after seeing so many comments like these. On Open router when trying to get it to output tool calls for creating tables, instead of providing the structured output correctly it was sending me peoples dropbox links and other image sharing site urls that led to pictures of random tables...

Llms seem to only impress a certain type of person. Hint, this type of person also was really excited about NFTs.

[dead]

It’s the same as the SaaS model. Price keeps going up, and to justify it they keep forcing you to upgrade to new versions with features that nobody asked for.

“More intelligence” is the new feature. Almost everyone is asking for this.

Citation: have you looked at OAI and Anthropic’s customer growth numbers?

Every use case of every customer doesn’t need more intelligence. I’m willing to bet that the vast majority will be perfectly fine running on “low intelligence” at a cheap price forever.

I for sure agree that plenty of current use-cases are solvable by non-frontier models.

However, you said “new versions with features that nobody asked for”, and I would prefer that you concede the point before shifting to arguing a new point.

What customers are asking for is smarter models. Because the tasks that only smarter models can solve are higher value, higher margin, than the tasks that non-frontier models can solve.

What are you talking about?

Prices of lowest tiers of models have fallen how much - 10-100x over the last two years.

And actually, the model quality you needed to pay for in the past, you can just run on device now essentially for free.

I've struggled with this. You definitely can have great cheap models. There are many of them open source and served profitably by neo-clouds. The big labs have basically given up on cheap models, and it is frustrating. It means applications are not likely to build as much on them anymore (we are shifting workloads from Haiku/Sonnet to Deepseek v4, for example).

I suspect the problem is that they need to charge a lot to keep revenue numbers up, and they are more worried about cannibalizing themselves than others cannibalizing them.

Good observations. There's definitely a trend in pricing increasing but also balanced by innovations and availability of other models (both open and closed) emerging as alternatives. It's natural for the labs to explore how much they can push pricing, and for competitors to explore how they can treat that margin as their opportunity to grow their business.

Eventually the pricing should be more stable.

> Eventually the pricing should be more stable.

Why do you think so? This game can be played forever, you just need strong marketing and orgs gullible enough to pay a higher price for a minor upgrade.

Its happening to Anthropic Haiku and Gemini Flash/Flash lite. All of them are increasing prices and deprecating cheap models.

Each model release gives an opportunity to reduce the number of old models still on offer, and charge a higher, less-subsidized tier. The trick is to charge a subsidized price that is less than an M3 Ultra, so they continue paying you rent, instead of a one-time fixed cost. So far open models can't compete with Opus 4.5 but as soon as it can, people will be looking at buying devices that can run that model locally.

We are a claude shop but we already bought two mac studios to start migrating less complex but still agentic workflows there. We will break even on those in less than a year.

Breaking even in less than a year? What's the math on that?

On Nano "it's not even close when you test it in real scenarios" - what have you seen? What kind of things can GPT-5 Mini handle that GPT-5.4 Nano cannot?

We’re using GPT-5-mini in an enterprise data-processing workflow, and we too see that GPT-5.4 nano performs materially worse for our requirements, roughly 30% worse as measured through our test suite.

Also can confirm gpt-5.4-nano was unable to even keep up with 4.1-mini. Had to move off of OpenAI once 4.1-mini was retired

5.5 is smart enough for 99% of my tasks. I need that level of intelligence at ever decreasing prices.

Why not self host or go to openrouter if you don't need SOTA frontier?

Hardware hosting old models isn't hosting new models. If you want consistent models, host your own open weights ones.

I don't know about Cursor or other outlets, but I use GPT 5.4 exclusively in Windsurf (Sorry, Devin!), and it's a very capable model that doesn't break the bank!.

> stay with the models we actually want

If you want control over the models you use, you have to self-host.

[deleted]

I think it's more that they're abandoning simpler AI tasks to chinese models. Qwen 35b and deepseek flash are better than gp5 mini on my tasks and way cheaper.

discontinuing the cheaper options is a risky move for openai

will trigger re-evaluations of models by other labs + inference providers

I can speak for myself. We are exactly at this moment trying to replace GPT 5 mini with an open weight / open source model. No luck so far.

No. Welcome to the wonderful world of SaaS. If you want your gui, your terms, your software, self-host.

But I think, in time, a new generation will relearn this truth.

Yeah, this is the classic silicon valley strategy of selling at a loss and then once they have captured the market inflate prices.

See Uber, Netflix, etc.

I don't see them capturing anything at this point. If inference was profitable then they could compete on price/model and capture the market. Then increase price and pay back the model training.

Feels like they are just pulling in as much as they can whilst competing on capabilities instead. At which point its a case of who can last the longest.

Doesn't feel like Uber/Netflix.

They're trying to do it more like a cartel where all major providers raise prices in unison. The intention is (probably) less specific entrapment and more getting people addicted to a fast LLM. From there, they all play with pricing to give a semblance of choice, without actually overly undercutting each other. At least, in the west.

This is all done to help valuations. The main revenue source are the investor dollars at the prospect that this industry will very soon actually be sustainable and highly profitable. It won't be, but if very soon stays around the corner consistently, the investor dollars keep coming.

This is a constantly repeated conspiracy theory and is not true at all. The api costs do increase but aggregate costs per task decrease. The question is: do people need lower intelligence models at all? The answer is a resounding NO!

How many people do you see using haiku or sonnet? I see very few and most people default to the latest model and just play with thinking effort. I think three layers are good enough and supporting more is not a good UX.

Do I need the most intelligent model to generate boilerplate code, which is my main usage for AI? Resounding No.

For my use case a model from a year ago is good enough

Are you only considering coding use cases?

Many enterprise use cases, such as simple data extraction, are well served by cheaper models.

I... use them all the time: plan with a more advanced model, build with a cheaper one. Anthropic literally packages a metamodel (opusplan) for that pattern.

Also: calling the SV blitzscaling strategy of using VC money to fund loss leader products with the goal of building a monopoly via dumping a conspiracy is quite the position given there's entire books written in the topic...

[deleted]

who tf would use mini when you have dsv4 flash

> Maybe it’s the realization that it was never that cheap in the first place and they're forcing us to upgrade in a slow and painful way.

All the analysis I have seen points to frontier models being profitable to serve. It’s using 50% or more of your GPUs for research plus CapEx for capacity expansion that makes these businesses so heavily cash-negative.

What you are observing is downstream of another detail. It gets more expensive to serve a model as utilization goes down. Plus the opportunity cost vs newer, more-profitable models.

There are plenty of valid reasons to critique here. “OpenAI is lying about this being a sustainable price to serve” is not one of them.

There is really ample analysis pointing to inference not being profitable, look at anything Ed Zitron has reported.

Ed Zitron is amazing at cherry picking data to fit his thesis.

No, you can't. These companies have two infrastructures: model training and model inference.

Inference needs to cache, it can't cache random model data, so it's essentially dedicated; it can't spin up models on demand, it has to know what demand is coming.

These companies are going to end up with very few models offered and that's probably generous. They might end up with just one model and you pay for removing it's safe guards.