you're killing my dream of blowing $50-100k on a desktop supercomputer next year and being able to do everything locally ;)
"the hacker news dream" - a house, 2 kids, and a desktop supercomputer that can run a 700B model.
you're killing my dream of blowing $50-100k on a desktop supercomputer next year and being able to do everything locally ;)
"the hacker news dream" - a house, 2 kids, and a desktop supercomputer that can run a 700B model.
Take a look at: https://www.nvidia.com/en-us/products/workstations/dgx-spark... . IIRC, it was about ~$4K.
Given that for a non quantized 700B monolithic model with let's say a 1M token context, you would need around 20TB of memory, I doubt your spark or M4 will get very far.
I'm not saying those machines can't be usefull or fun, but it's not in the range of the 'fantasy' thing you're responding to.
I regularly use Gemini CLI and Claude Code, and I'm convinced that Gemini's enormous context window isn't that helpful in many situations. I think the more you put into context, the more likely the model is to go off into on a tangent and you end up with "context rot" or get confused and start working on an older no longer relevant context. You definitely need to manage and clear your context window and the only time I would want such a large context window is when the source data is really that large.
An M4 Max twice the memory bandwidth (which is typically the limiting factor)
I'll say neither of them will do anything for you if you're currently using SOTA closed models in anger and expect that performance to hold.
I'm on a 128GB M4 Max, and running models locally is a curiosity at best given the relative performance.
I'm running an M4 Max as well and I found that using project goose works decently well with qwen3 coder loaded on LM Studio (Ollama doesn't do MLX yet unless you build it yourself I think) and configured as an openai model as the api is compatible. Goose adds a bunch of tools and plugins that make the model more effective.
It will be sort of decent on a 4bit 70B parameter model, like here https://www.youtube.com/watch?v=5ktS0aG3SMc (deepseek-r1:70b Q4_K_M). But yeah, not great.