Not even a cluster of Mac Pros could run a dense 5T parameter model with RDMA, to my knowledge.
SOTA models are reportedly MoE, not dense.
SOTA models are reportedly MoE, not dense.