> You have MoE models (gemma 26b, qwen 35b, north mini code 30b) who are pretty fast, but make a lot of mistakes
This is sadly also my experience. I wish we had some MoE models with a higher ratio of active parameters per total. My experience is that the newer MoE models that can run in a 64b laptop have too few active parameters to be useful outside narrower, specific tasks. Mixtral 8x7b was a 14b active parameter (56b total) MoE model a few years ago and was probably the best model one could run in that range for some time, but it is too old now.
I have been using the qwen 27b and it is great, but running a dense model like this in a macbook is a bit suboptimal, and i wish I could run sth faster than 15 tok/s.
I would try a 6-bit MoE and maybe with unsloth's studio, they claim to have auto tool fixing which is where i see a lot of issues with MoEs
I'm on a 48gb M5 Pro right now and it's been okay, a lot of my rough experiences have been with MLX and I'm finding that GGUFs are okay now