I really recommend Qwen3.6 27B.
Make some tests, and its 8 bit version runs at 30tok/s when using llama.cpp with MTP and run on Macbook Max M5. I have 128 GB, but but 64 GB is well enough. https://github.com/stared/benching-local-llms-on-apple-silic...
When using benchmarks, it gives more-or-less the level of SotA mid-late 2025.
I run the exact same model, on the exact same hardware - amazing results. Pair it with good search skills (Tavily, Brave, Exa) and you have a near-SOTA model on your desk.
Did you mean 2025?
Yes, fixed