I have been thinking about how to use this. Since it doesn’t support tool calling I have been considering a dual model deployment, where a small tool calling llm drives the majority of the user experience, and vibe thinker is tapped for reasoning by the other llm.

So who has suggestions on small models with excellent tool calling capabilities?

Gemma 4 E4B and Qwen 3 4B are pretty good, but fine-tuning makes them really good. There are tradeoffs at this size, so you'll have to find (or make) a finetune that does what you need.

Qwen3.6-35B-A3B is pretty amazing. I'm using it with 96k context on 24GB VRAM through ollama.

Maybe bonsai 8b would make the duo, if you do try it, pls post here as I'm a bit curious too.

granite 4