So what other models use less than half of Haiku's tokens while providing higher success rate?

Why is Haiku the benchmark though, with code generation don't we primarily care about the quality of the code - not the speed or efficiency at which it's generated?

You would be surprised how much code haiku writes behind the scenes. With the whole 'plan w/ opus, spawn subagents w/ haiku' that cc does. And you'd be surprised how useful the small models can be under some guidance / hand holding. You can daily-drive gpt5-mini and still find it useful. They're not as good as the big ones, obviously, and can't handle a project start-to-finish on their own, but given a well-scoped task, they'll do it just fine.

I'm not sure I follow, but I'll give you a very fresh example.

I was implementing a re-print functionality in my warehouse management system.

It took Opus 4.8 high 24m1s and 87k tokens. Took Haiku 6m30s and 41k tokens.

After that time I had to provide (minor) adjustments to both. But Haiku allowed me to iterate faster. Code quality for that somewhat trivial use case was similar.

Actually, I would even say that Opus provided a sub par solution: instead of fixing an issue where carrier label pdf wasn't saved as the state machine progressed to the latest step, it went through a much complex solution of re-generating those by scratch. Which is also wrong, as it was de-facto booking the carriers twice for the same order.

Haiku simply added another field on the terminal state that carried the already generated urls.

I don't think it's a good idea to default to highest effort/bigger model without taking into account the time it takes and the task complexity.

Imho we should experiment rather than assume that what the rest of the community does to be the best practice.

Totally agree. I've been using cheap Chinese open-source models via OpenCode Go, and they are faster, cheaper and in my experience arrive at the solution quicker because they are more pragmatic.

Yesterday Codex was making a big issue out of a new module that was upgraded in our cluster and because of which the same SSH key would be "regenerated" by Terraform. No big deal, it just truncates a newline at the end of the SSH key and it works all the same. But not being aware that this, as an example, is unimportant can cost a lot more time than using the big models saves.