Given that for a non quantized 700B monolithic model with let's say a 1M token context, you would need around 20TB of memory, I doubt your spark or M4 will get very far.
I'm not saying those machines can't be usefull or fun, but it's not in the range of the 'fantasy' thing you're responding to.
I regularly use Gemini CLI and Claude Code, and I'm convinced that Gemini's enormous context window isn't that helpful in many situations. I think the more you put into context, the more likely the model is to go off into on a tangent and you end up with "context rot" or get confused and start working on an older no longer relevant context. You definitely need to manage and clear your context window and the only time I would want such a large context window is when the source data is really that large.