The best local models are literally right behind Claude/Gemini/Codex. Check the benchmarks.
That said, Claude Code is designed to work with Anthropic's models. Agents have a buttload of custom work going on in the background to massage specific models to do things well.
The benchmarks simply do not match my experience though. I don’t put that much stock in them anymore.
I've repeatedly seen Opus 4.5 manufacture malpractice and then disable the checks complaining about it in order to be able to declare the job done, so I would agree with you about benchmarks versus experience.