Hacker News

You can run it locally too. Below are a few of my local models, this is coming in light compared to them. At Q4 it's ~60B. Furthermore being a MoE, most of it can be in system memory and only the shared experts needs to go to GPU, provided you have a decent system with decent memory bandwidth, you can get decent performance. I'm running on GPUs, folks with Apple can run this if they have enough ram with minimal effort.

  126G /llmzoo/models/Qwen3-235B-InstructQ4
  126G /llmzoo/models/Qwen3-235B-ThinkingQ4
  189G /llmzoo/models/Qwen3-235B-InstructQ6
  219G /llmzoo/models/glm-4.5-air
  240G /llmzoo/models/Ernie
  257G /llmzoo/models/Qwen3-Coder-480B
  276G /llmzoo/models/DeepSeek-R1-0528-UD-Q3_K_XL.b.gguf
  276G /llmzoo/models/DeepSeek-TNG
  276G /llmzoo/models/DeepSeek-V3-0324-UD-Q3_K_XL.gguf
  422G /llmzoo/models/KimiK2