It's a start and I welcome competition but I don't think I ever used small cloud models like Haiku 4.5. They are cute but for serious coding they tend to waste your expensive time.

And this certainly wont bring me back to GitHub Copilot which I cancelled yesterday.

GitHub Copilot had competitive pricing until yesterday when they changed from per-request to one of the most expensive per-token quotas. Seriously, take a look at their burning subreddit for some laughs: https://www.reddit.com/r/GithubCopilot

I have since changed to DeekSeek Flash on high which is Sonnet+ level for almost free.

If I feel I still need smarter models I might signup for $20/mo Codex to use GPT 5.5 which, in my opinion, is the best I can access right now.

I use larger models to organize work into a topologically sorted task graph and pin smaller models to the tasks depending on the complexity with a larger model evaluating the work and patching where necessary. This uses haiku quite often for routine work. I’m able to do multi hour highly complex work with superior results and a much lower bill as a result by doing this, with a parent orchestrator able to do a massive labor within a single context window by effectively organizing work and reviewing quality and integrating where needed. I don’t use haiku directly, but it’s often 30-40% of any major efforts token use. This further improves time to completion as well as cost - but I find haiku is better at following literal instructions and plans without “second guessing,” while opus class models second guess in their thinking constantly.

As such, haiku isn’t a waste of my time, it saves enormous amounts of time for me. But I spent a large amount of time building the orchestration system up front and iterating on it to get here. Interestingly i found my experience as a director and later a distinguished engineer gave me the tools to build it and get it working well and reliably end to end - the dynamics of multi agent workflows of varying capability is not a lot different than the dynamics of a 1000 engineer organization.

Everyone does that. But I don't find Haiku useful for actual coding tasks. Good to, ehm, generate commit messages and summaries.

In my tests, openweight Qwens and GLM are way better than it.

Topologically sorted task graph is exactly right — the orchestrator/worker split maps cleanly to senior engineer delegating to juniors, where cheap models handle the leaf nodes fine.

Got anything from your orchestrator you could share that’s usable by others? Sounds like how I’d like to work but is difficult to get going from scratch

https://github.com/7mind/baboon - all the backends apart from C# and Scala ones were created automatically, same for LSP server, same for playground.

I've been doing benchmarking of various models for finding hard security bugs, and my faith in Haiku (and Sonnet, even) has dropped precipitously in the process. Self-hosted Qwen 3.6 27B consistently outperforms both for finding security bugs, which was a shocking result. I expected Qwen to be around Haiku level, maybe a little worse, and I definitely expected it to be worse than Sonnet.

And, DeepSeek and MiMo perform much better than Haiku and Sonnet, near Opus/GPT 5.5 levels, at a fraction of the cost.

There's seemingly no reason to ever use Haiku or Sonnet, if you're not getting it for free or as part of a subscription (that you don't usually saturate).

I don't suppose you've had a chance to benchmark MiniMax V3 yet? I've only just started testing other models after being an Anthropic fan. I haven't put MiniMax V3 to coding tasks yet, but something about my early simple tests has impressed me. The MiniMax API pricing is about 7% of Anthropic API prices (about matching Anthropic's subscription pricing).

I don't think that's what these small models are for. They are for things like text summarization and generating a title for your AI session. Maybe Haiku occupies a weird zone where it's overpowered for those tasks but underpowered for anything more sophisticated. But for example I used it on an agentic reasoning task recently (reading a chunk of information and drawing a written conclusion, not writing code) and it did just fine. More powerful model would have been a waste of money.

Sure, but it's priced higher than many better models. I'm not saying use the biggest models for everything. I'm saying Haiku is not a great deal as small models go. You can even self-host a model that is competitive if you've got a pretty beefy machine.

Haiku costs $1/$5. DeepSeek V4 Flash, a stronger model, is only $0.0028/$0.14/$0.28. That first number is the cached input, and DeepSeek caching is crazy efficient. So, using DeepSeek V4 Flash costs about an order of magnitude less than Haiku and performs better.

I have a Claude subscription because I'm willing to pay a premium for the best model for coding, one that doesn't waste as much of my time doing dumb stuff. But, if I need something other than Claude Code, I'm using something other than Claude models. Why burn money for no benefit?

Oh, also, Haiku chews tokens like crazy. In my benchmarks it used three times more tokens than the next highest model. Of course, security bug hunting is not in its wheelhouse, so it's not fair to judge it based on that one thing, but if it's more expensive per token and burns a lot more tokens, it ends up being a lot more expensive.

I suspect the outrageous pricing of haiku/sonnet is offsetting the cost of opus. The value proposition a year ago was they were cheaper than opus, not that they're a fantastic value (which they're not)

Haiku/Flash/small models are underpowered for literally anything where being non-false-positively correct on details matters at least like 25%. (That's not to say they are only correct 25% of the time, it's definitely more than that, but they're blatantly confidently wrong often enough that the wasted time is a significant net negative for me, even on relatively trivial tasks.)

Same opinion. Opus is best for coding, but Qwen 3.6 27b Q8 is next, before Sonnet.

Sonnet might have more knowledge and is maybe good for making excel sheets, but it does not write good code and does not follow instructions well.

But 27b Q8 needs a very beefy PC (48GB VRAM or more), so it is not an option many people can use and DS4F is so cheap right now, if you are open to externally hosted models.

DeepSeek competes with Sonnet, not significantly worse or better. It tends to do weird things in codebases on the bigger side.

At $3/$15, Sonnet is more than an order of magnitude more expensive than DeepSeek at $0.435/$0.87 (with cached input pricing of $0.003625, DeepSeek is very good at caching, so it's very cheap to use). So, if they're equal in performance, DeepSeek is ten times better value.

But, from what I can tell DeepSeek is better than Sonnet, though I agree it is not at the level of current Opus or GPT 5.5 (but I think it probably beats Gemini Pro 3.1). I use the best model I can for code, because the cost of weaker performance is more than the $100/month I pay for Claude Opus, but it's worth knowing there are very cheap, very good, models for stuff I want to do that isn't Claude Code.

I think there are so many variables from harnesses to tasks, making it very hard to put the models to a pecking order unless one beats another in virtually every task (like in Opus vs DeepSeek).

But all in all, I don't think we disagree.

Almost exactly the same story here. I've also had little to no refusals from DeepSeek, with it's Chinese values meaning substantially less friction when it comes to things like reverse engineering, finding copyrighted files, working with dubiously-sourced source code, et cetera. I don't think I'd go back to Copilot even if they dropped prices by 90%.

Are you purchasing directly from DeepSeek? Any concerns as far as privacy or data protection?

Using OpenRouter, going to migrate to DeepSeek's official API soon. I'm not using it for anything commercial or for private data so I have no privacy qualms.

Makes sense. Privacy is my only real hang up with DeepSeek. Both of the big SOTA providers have become extremely filtered. Things that I could do one version ago are now getting refusals. Anthropic is almost unusable. ChatGPT is slightly better. Even with a "cyber exception" in place and a vetted account. They are going to force me to take my business elsewhere.

GitHub Copilot refuses to do any security testing or proof-of-concepts for exploits. While I understand why, we pay for Enterprise and I’m working on our proprietary code base. It’s incredibly annoying.

I’ve actually had luck taking the analysis from GHCP and pasting it into our M365 Copilot and getting a useful poc to stick into my bug reports.

You can always run deepseek yourself, v4-pro and flash are open weights. It's a little tricky to get the hang of self deploying open weight models but you do fully own your deployment substrate and privacy narrative at that point.

> Any concerns as far as privacy or data protection?

We moved to OpenCode Go ($10/mo), so we could switch between DeepSeek v4, GLM 5.1, and Qwen 3.7 models run by providers in EU, US, & Singapore that OpenCode FAQ claims don't use retained data for training.

  What about data and privacy?

  The [OpenCode Go] plan is designed primarily for international users, with models hosted in the US, EU, and Singapore for stable global access. Our providers follow a zero-retention policy and do not use your data for model training.
I find their rather verbose privacy policy is not making far-reaching guarantees about any of this though: https://opencode.ai/legal/privacy-policy

Yeah, seems like this is in the range of Qwen 3.6, Gemma 4, Nemotron 3 Super, and the like. There are lot of models, including much smaller cheaper ones (like Qwen 3.6 35B-A3B), that are similarly competitive with Haiku. I can run these on my laptop, I don't need to rent them from Microsoft.

I suppose if you're reeling at the new Copilot bill but want to stay in their ecosystem, this gives you something to use, but for most folks, there's a plethora of better options.

Agreed. Seems like this could have been a nice model if we would still be in the old GitHub Copilot free request/ premium multiplier mode. It could have been a good compromise to somehow reign in the costs for Microsoft.

But with Copilot now just being paying per-token prices I don't see how this is competitive with Chinese models.

It is probably telling you can't find the costs in the announcement. Because Input $0.75 Cached input $0.075 Output $4.50 might be competitive with Haiku, but nobody in their right mind uses Haiku and Anthropic has abandoned it chasing the tokenmaxers who aren't thinking about budgets.

So I guess they are aiming for corporate customers that are bound to Microsoft through compliance approval that will soon start seeing their budgets explode that have to find some corporate compromise.

The $20/month ChatGPT plan that comes with codex is good value. Even just have premium ChatGPT is nice. I get rate limited regularly but it still lets me do most things.

The $100/month is excellent value. I don’t understand how’s that not the default option for all professional developers. Unless people don’t produce any value writing code, like playing around and experimenting with vibe coding, I understand. But if software development is your actual income, and assuming you live in a wealthy country, $100/month is nothing for a tool like Codex.

Picked up the most recent SO developer survey that features relevant info, the 2024 release: https://survey.stackoverflow.co/2024/work#coding-outside-of-...

The supermajority of respondents did report that they do engage in some coding outside of working hours, for one reason or another. I'm impressed; I'm basically a zombie after hours, rarely in any shape to touch anything technical. Good for them.

But then only 19.3% of respondents ticked that they code for freelancing reasons, and only 15% said they're doing it in an attempt to bootstrap a business. These groups were the only types that suggested revenue generating after-hours activity, and they even overlap to a non-obvious-to-me extent. But even if we pretended they didn't, that adds up to like a third at best.

So when you say:

> I don’t understand how’s that not the default option for all professional developers.

that's in contradiction with this data (and imo common sense), which suggests that the supermajority of professional developers simply do not perform revenue generating software development activity outside of work hours, period. Therefore, for them, the ROI on any potential AI subscription is a flat and constant zero.

Unless you envision people working at "bring your own license" type shops, I don't know how this is supposed to make sense. These are work tools, corporate should be providing them already. But then I'm clearly not from a "wealthy" country either, so YMMV.

Work pays for my work stuff and I have both claude and codex there. On the personal side I sometimes go days without using it. It's more like my assistant to do annoying terminal shit on my home computer and like personal projects I guess. It's plenty for that.

It's because that price point is for individuals not for companies. So my company can't pay for the $100 plan unlike with Claude. Only pay-as-you go pricing is available for companies beyond the $29 plan which runs out for me in 2/5 hours. And pay-as-you-go is insanely expensive.

I don't use LLMs for code generation except for very simple, small things because they suck at it and I wouldn't want to ship what they write.

Since I use LLMs basically only for analysis and as a signal in bug discovery, debugging, research and general search, I don't need a very powerful model and I don't need high token counts. A $100 subscription would be entirely way too much for useful usage for me, and would border on just using tokens for the sake of using them.

Every developer who writes code for a living should get an AI subscription from work and not have to pay for it himself.

I don’t live in a wealthy country and my salary isn’t that great, but Anthropic’s 100 USD tier is still worth it for me. I’d probably go with a 50 USD tier if they had one but oh well. I’m also looking at DeepSeek since they permanently lowered their prices and feel like I could probably add the cheaper Codex tier to the list (you really feel the limits with the cheaper Anthropic one though).

The small stuff has their place. I have this safari extension and needed a way to quickly title people's chat histories. Haiku is the fast cheap thing to come up with decent titles of blocks of text. I feel like there's a bunch of those little things lying around you need a model for. I'm even finding Apple's Foundation Model is super useful for stuff like that. Even summarizing an article. It's like equally awful at doing it, but gets enough done to still be useful as a way to be like "oh yeah, this article is actually worth reading"

Small models are super useful. But I'm skeptical of their use for coding in particular, which is what this model is advertised for.

If you use claude-code Haiku is used under the hood for certain task. I'm not sure what it is, but there's some kind of routing that goes to Haiku automatically.

Won’t (presumably) all the market actors converge on similar pricing? If OpenAI stopped operating on subsidies and charge the true costs and their most token hungry customers are the ones that switch to Anthropic and others, then their pricing model switch will also be around the corner.

Unless of course we’re thinking Copilot will be more expensive than others longer term. But is that a reasonable assumption?

Anthropic & co charge API users much more, not least to demolish the middlemen low-effort plays like Cursor and Copilot. To not own the model is not viable in 2026.

Sorry, what do you mean by "To not own the model is not viable in 2026."

I assume I'm misunderstanding you (likely my fault), because the way I read that is that you're saying nobody should currently be using models owned & hosted by companies like OpenAI and Antheopic, while clearly a huge number of people are using those in 2026 despite not owning them.

It's that companies like copilot/cursor are in real trouble if they are in the business of reselling expensive Anthropic tokens

But isn't the current understanding that harness is equally important as model once you get above a certain threshold, so there seems to be room to add value there.

Cursor is potentially about to be acquired by X.ai (i.e. SpaceX), unless this is just some IPO game being played by Musk. They are certainly not just a token reseller since they have their own models in addition to their own vector database approach for code matching.

I think it’s more correct to say they charge subscription users much much less. I assume less even than the cost of providing the inference, if you actually are using it.

Haiku does quite well if given a detailed plan. That means much more detail than you otherwise would, but you can still save over e.g. having Opus or Sonnet do everything by having them expand their initial plans into more specific levels of detail and feed it to Haiku (or similar level models).

I personally wouldn't use models that class directly, though - I'd use them in a harness as a "backend" for more capable models. And Haiku itself, as opposed to other smaller models, is still expensive.

Makes sense as part of a larger coding workflow, especially if it’s fast. Using a trillion parameter model to figure out how to call a targeted edit tool or generate a commit message is a waste. Also narrow tasks like “make the background darker” or “rename this function and update callers”

> “rename this function and update callers”

I'm old enough to remember when IDEs could do this without needing a couple gigabytes of matrices to do it

(LLMs are great for anything even slightly more complicated ofc)

The first time I was impressed by AI coding was when I pointed it at some switch case monster code and told it to replace it with a strategy pattern.

And it did just fine.

So no matter what you think about vibe coding, using AI for these slightly more complicated use cases is genuinely useful.

[deleted]

I've been having really good results with DeepSeek-v4-flash, qwen-3.6-moe, and the older gimini-3-flash-preview. (recent geminis suck hard)

Small models are more than enough for the majority of tasks these days. Plan and review with the bigger ones, let the little ones explore and implement.

OpenCode Go is $10/month for the open weight models with nice quotas: https://opencode.ai/go

You don’t have to limit yourself to the tiny models with the OpenCode Go plan, you can get a lot of usage from the bigger models if you keep the cache hot.

I am about 85% through my quota with 9 days left before refresh and have just used over 1B tokens, mostly DeepSeek V4 Pro, but also a little mimo 2.5 pro and kimi k2.6

For sure, I've been flipping between flash/pro (or the equivalent for other families), been trying to stick to one family per project as a way to test them out independently over longer periods and more realistic/diverse tasks. I've definitely spent more quota on pro and pushed more tokens through flash.

What application/UI are you using deep seek flash high on? Still copilot or something else

> "GitHub Copilot had competitive pricing until yesterday when they changed from per-request to one of the most expensive per-token quotas. Seriously, take a look at their burning subreddit for some laughs"

AI is expensive and it has been heavily subsidized. I you think $20/mo for Codex/Claude flat vs a more usage based model you're in for a shock. Especially once these companies go public and have to meet investor expectations.

> They are cute but for serious coding they tend to waste your expensive time.

90% of corporate job tasks are trivial enough that Haiku can handle them.

Just this morning I have been implementing a reprint functionality in our warehouse management system, which needed to print again carrier labels and delivery notes for a specific order.

It essentially had to do the same workflow of print, but instead of generating and uploading the pdfs, it only had to fetch and print them.

Took Opus 4.8 high 24m1 seconds and 87k tokens. Took Haiku 6m30 seconds and half the tokens.

So not really sure what do you mean by "wasting your expensive time" here. I think you really don't experiment with these tools and assume higher effort, bigger model => time saved, but that's true only when tasks are much bigger and complex enough that a smaller/less precise model would fail or land work of much lower quality.

Unfortunately there's no defending Haiku 4.5 at this point when cheaper and better options are available.

TLDR:

https://artificialanalysis.ai/models?models=gemini-3-5-flash...

and: https://i.imgur.com/nTu3VCZ.png

For starters I did experiment a heck lot with models since Github Copilot gave me access to OpenAI, Gemini and Anthropic models. So I probably experimented more than the average LLMer. When GitHub Copilot had a generous quota I ran the same tasks with many models to compare them (and pursue best solution among them) quite often.

Now about my experience with Haiku, I think it was free for some time in GitHub Copilot, then it was 0.33x quota usage (when Sonnet was 1x and Opus was 3x, good times). I tried to use it for light coding for about a week.

In my tests I concluded that there was zero reason to use 0.33x priced Haiku in my coding workload because it constantly generated subpar solutions. Even when they worked, Sonnet at 1x and Opus at 3x quota usage had a lot less tech debt on average and my plan permitted continuous Sonnet/Opus usage for my workload, otherwise I would use Gemini Flash (the old one, not this 3.5 one) which was better than Haiku by a mile.

Then GPT 5.4 came at 1x quota usage and it was competitive with Opus at 3x quota usage. So I stopped using Opus in favor of GPT and by this time there was even less reason to use Haiku on my $39/mo GitHub Copilot plan.

And now we have DeepSeek v4 which is Sonnet+ levels in my tests because it has an actual 1 million token context window and their crazy alien caching tech (https://huggingface.co/blog/deepseekv4).

I urge you to throw $5 at OpenCode Go plan for 30 days and toy around with DeepSeek Flash on high setting (not max).

Or MiMo 2.5 Pro on the same OpenCode Go plan. 2 amazing models.

> DeepSeek Flash on high setting

In your experience, is max worse or you suggest it for less token use?

> MiMo 2.5 Pro on the same OpenCode Go

Xiaomi dropped dropped MiMo 2.5 rates by 70%+ [0] & now it is cost competitive with DeepSeek v4 Pro. I haven't used MiMo, but since you have, do you find it to be better than DeepSeek v4? If so, for what tasks? How do you decide when to use which, if you have an intuition for it? Thanks.

[0] https://news.ycombinator.com/item?id=48282814

> In your experience, is max worse or you suggest it for less token use?

Yes. DS4 Flash max is incredibly chatty for minimal gain over DS4 high.

I asked the same question a month ago: https://news.ycombinator.com/item?id=47978820 and confirmed in my tests.

> ...MiMo, but since you have, do you find it to be better than DeepSeek v4?

I didn't test MiMo 2.5 enough to form a veridict but from initial tests it is equivalent to DS4. But MiMo 2.5 (non Pro) has the advantage of having vision capability and MiMo is priced equaly as DeepSeek v4 in the $10/mo OpenCode Go now, after the discount you mentioned, see the yellow bars at https://opencode.ai/go

I'll start testing MiMo seriously next week.

I really hope one day there is something like Opus 4.8 but with Cerebras' speed -- they reach over 1,000t/s on gpt-oss-120b but that model is seemingly not even properly trained for tool calling. But watching it slam out several entire screens of thinking/reasoning per second is amazing. I'd love that with Opus quality.

I like gpt oss - great model even if not too smart.. runs on my laptop at over 100ts has a certain tone that I like over all these qwens stuck up their asses.

I wonder when THEY make it illegal to vote with your wallet.