You can run it locally too. Below are a few of my local models, this is coming in light compared to them. At Q4 it's ~60B. Furthermore being a MoE, most of it can be in system memory and only the shared experts needs to go to GPU, provided you have a decent system with decent memory bandwidth, you can get decent performance. I'm running on GPUs, folks with Apple can run this if they have enough ram with minimal effort.
126G /llmzoo/models/Qwen3-235B-InstructQ4
126G /llmzoo/models/Qwen3-235B-ThinkingQ4
189G /llmzoo/models/Qwen3-235B-InstructQ6
219G /llmzoo/models/glm-4.5-air
240G /llmzoo/models/Ernie
257G /llmzoo/models/Qwen3-Coder-480B
276G /llmzoo/models/DeepSeek-R1-0528-UD-Q3_K_XL.b.gguf
276G /llmzoo/models/DeepSeek-TNG
276G /llmzoo/models/DeepSeek-V3-0324-UD-Q3_K_XL.gguf
422G /llmzoo/models/KimiK2