I'll say neither of them will do anything for you if you're currently using SOTA closed models in anger and expect that performance to hold.
I'm on a 128GB M4 Max, and running models locally is a curiosity at best given the relative performance.
I'll say neither of them will do anything for you if you're currently using SOTA closed models in anger and expect that performance to hold.
I'm on a 128GB M4 Max, and running models locally is a curiosity at best given the relative performance.
I'm running an M4 Max as well and I found that using project goose works decently well with qwen3 coder loaded on LM Studio (Ollama doesn't do MLX yet unless you build it yourself I think) and configured as an openai model as the api is compatible. Goose adds a bunch of tools and plugins that make the model more effective.
It will be sort of decent on a 4bit 70B parameter model, like here https://www.youtube.com/watch?v=5ktS0aG3SMc (deepseek-r1:70b Q4_K_M). But yeah, not great.