> A Opus 4.7/Gpt5.5 class model is 5 trillion parameters[1].
You could run it on a cluster of nodes that each do some mix of fetching parameters from disk and caching them in RAM. Use pipeline parallelism to minimize network bandwidth requirements given the huge size. Then time to first token may be a bit slow, but sustained inference should achieve enough throughput for a single user. That's a costly setup of course, but it doesn't cost $900k.
> You could run it on a cluster of nodes
Not sure this is a MBP either.
Not even a cluster of Mac Pros could run a dense 5T parameter model with RDMA, to my knowledge.
SOTA models are reportedly MoE, not dense.