Hacker News

yanis_t 4 hours ago [ - ]

Assuming it is almost as good as Opus 4.6 (which benchmarks seem to give evidence for), and assuming we are having a good enough harness (PI, OpenCode), it's is now more than 5x cheaper.

I just want to remind you that this is happening at the same time as Anthropic A/B tests removal of Code from Pro Plan, and as OpenAI releases gpt-5.5 2x more expensive than gpt-5.4...

stingraycharles 4 hours ago [ - ]

> Assuming it is almost as good as Opus 4.6 (which benchmarks seem to give evidence for)

That’s a big if. It’s my experience that models that perform very well on benchmarks do not necessarily perform well in real life.

I’ve mostly started ignoring the benchmarks and run my own evals.

jatora 2 hours ago [ - ]

If benchmarks are all to be believed then gemini 3.1 and grok 4.2 are still in the lead pack. A laughable notion to anyone who has actually tried to use them and compared.