Is there any hope for people that cant even run 27B parameters, Qwen3.6 or otherwise? Are there any quantized models that do well with tool calling at smaller parameter sizes?
I do not have a crazy rig, a modest gaming one at that, but in trying to understand more about agents and their capabilities, I am SOL with my 16 GB of RAM and 8GB of VRAM. I can get most small, non tool calling models to perform well, but I've had major issues with anything over 9B doing anything more than reasoning (egregiously slow at higher parameter counts).
And so far, I cant get even Pi to extend itself or do any meaningful work with any of the models I currently can get to run.
I suspect with those specs, you're not in the game right now for reliably using local models for code generation. The easiest way in is a MacBook with at least 32GB of RAM. This should be able to run a 4bit quantization of qwen 3.6 using the MLX format really well.
Now that I’m dipping more into this space, am gonna see what I can upgrade with the motherboard I have, but RAM pricing as it is, I’ll need to be smart about when I upgrade.
I very much appreciate the frank response, as it makes me feel less defeated at knowing my understanding of how it should work is not the full issue, hahaha
M series macs are usually used for running these LLMs locally because the GPU and CPU share the same pool of RAM at very low latency. If you upgrade your RAM on a different kind of chipset without the Unified Memory Architecture, then it'll be much slower to produce all the tokens you need. Just another data point to add to your upgrade equation.
I have 8GB VRAM but 32GB RAM. Qwen 3.6 35B runs nicely.
You should look at gemma-4-26B-A4B. 16+8=24gb and Q4 is about 16GB. Not much context left, but might run.
I have 8GB VRAM, but 32GB sys ram. I can run qwen 3.6 35B at 30 tok/s. I also use pi, and it's smart enough to extend itself(multishot and maybe a few tries)
For you, you could try gemma-4-26B-A4B
Thank you for the recommendation, and so far, it has been working great (within reason, haha). It doesn’t kill my rig when thinking, but it definitely needs more training wheels to nudge it towards the goal.
It seemed to get the idea of my prompt to extend the footer info (I want it to show the model abilities like tool calling or reasoning where the context percent thing is), made a plan and wrote the file, but then got hung up on implementation because it couldn’t figure out how Pi renders that part of the UI in Powershell
So possibly trying a different terminal might help on that front, haha
I think at 16 GB you'd struggle to run the regular development tools nowadays, forget about any interesting inference.
Fully agreed, and my hope is as open models grow and change, that getting some amount of this working on Pro-sumer hardware will be more attainable.
But certainly seems like we are a few years away from that, sadly.
Am I also screwed in being able to train my own small model or adjust another one with such a non-workhorse PC?
Training requires even beefier hardware than inference.
I got a 32GB of RAM and a 6GB VRAM card; tried both 27B and 35B, with pi. And it's a laptop. Speed isn't exactly a concern for me, I can enjoy the real life while the agent is doing its thing. And while they appear smart enough on the first glance, once it reads a file that's more than 100 lines it loses all memory of anything I asked it to do. The lack of failure state or any indication what might be wrong here is just frustrating. Guess local models aren't for me, unless I move to Silicon Valley and redeem my free MacBook at a local Startbucks.
[dead]