The thing I'm most excited about is the moment that I run a model on my 64GB M2 that can usefully drive a coding agent harness.
Maybe Qwen3.5-35B-A3B is that model? This comment reports good results: https://news.ycombinator.com/item?id=47249343#47249782
I need to put that through its paces.
Yesterday I test ran Qwen3.5-35B-A3B on my MBP M3 Pro with 36GB via LM Studio and OpenCode. I didn’t have it write code but instead use Rodney (thanks for making it btw!) to take screenshots and write documentation using them. Overall I was pretty impressed at how well it handled the harness and completed the task locally. In the past I would’ve had Haiku do this, but I might switch to doing it locally from now on.
I suppose this shows my laziness because I'm sure you have written extensively about it, but what orchestrator (like opencode) do you use with local models?
I've not really settled on one yet. I've tried OpenCode and Codex CLI, but I know I should give Pi a proper go.
So far none of them have be useful enough at first glance with a local model for me to stick with them and dig in further.
I've used opencode and the remote free models they default to aren't awful but definitely not on par with Gemini CLI nor Claude. I'm really interested in trying to find a way to chain multiple local high end consumer Nvidia cards into an alternative to the big labs offering.
Kimi K2.5 is pretty good, you can use it on OpenRouter. Fireworks is a good provider, they were giving free access to the model on OpenCode when it first released.
When you say you use local model in OpenCode, do you mean through the ollama backend? Last time I tried it with various models, I got issues where the model was calling tools in the wrong format.
That's exactly why I'm asking! I'm still mystified about whether I can use ollama at all. I'm hopeful that the flexiblity might become interesting at some point.