Hacker News

The best local models are literally right behind Claude/Gemini/Codex. Check the benchmarks.

That said, Claude Code is designed to work with Anthropic's models. Agents have a buttload of custom work going on in the background to massage specific models to do things well.

girvo 2 months ago [ - ]

The benchmarks simply do not match my experience though. I don’t put that much stock in them anymore.

Balinares 2 months ago [ - ]

I've repeatedly seen Opus 4.5 manufacture malpractice and then disable the checks complaining about it in order to be able to declare the job done, so I would agree with you about benchmarks versus experience.