People thinking to self-host Kimi K2.6 had better be prepared for how big it is.

Q8 K XL quantization for instance is around 600GB on disk. I would bet about 700GB of VRAM needed.

Quantizations lower than Q8 are probably worthless for quality.

Or 2.05TB on disk for the full precision GGUF.

https://huggingface.co/unsloth/Kimi-K2.6-GGUF

If you can afford the hardware to run Kimi K2.6 at any decent speed for more than 1 simultaneous user, you probably have a whole team of people on staff who are already very familiar with how to benchmark it vs Claude, GPT-5.5, etc.

While most people would not be able to run Kimi K2.6 fast enough for a chat, as a coding assistant the low speed matters much less, especially when many tasks can be batched to progress during a single pass over the weights.

If you run it on your own hardware, you can run it 24/7 without worrying about token price or reaching the subscription limits and it is likely that you can do more work, even on much slower hardware. Customizing an open-source harness can also provide a much greater efficiency than something like Claude Code.

For any serious application, you might be more limited by your ability to review the code, than by hardware speed.

DeepSeek V4 Pro is way more effective at batching multiple tasks together since the KV cache is so much lighter - a max of ~10GB at full 1M context, and in a linear proportion with context according to the DeepSeek V4 release paper. That's extremely impressive, it unlocks batching, agent swarms etc. even on severely memory-constrained platforms, especially at smaller max context.

Kimi is a natively quantized model, the lossless full precision release is 595GB. Your own link mentions that.

So, realistically, $100K for an 8x RTX 6000 Pro system that can run it at a usable rate.

I think people will always disagree on what qualifies as a "usable rate". But keep in mind that practically no one sensible is running the latest Opus or GPT around the clock, especially not at sustainable, unsubsidized prices. With open-weights models it's easy to do that.

Also for people doing something medical, privacy or sensitive data related, there's an almost incalculable value (depending on industry niche) in having absolutely no external network traffic to any servers/systems you don't fully control.

the 'unsloth' link above is a 3rd party person that has quantized it to Q8, the original release is considerably larger in size than 600GB:

https://huggingface.co/moonshotai/Kimi-K2.6

No.

I have downloaded Kimi-K2.6 (the original release).

  du -sh moonshotai/Kimi-K2.6 
  555G moonshotai/Kimi-K2.6

  du -s moonshotai/Kimi-K2.6 
  581255612 moonshotai/Kimi-K2.6
For comparison (sorted in decreasing sizes, 3 bigger models and 3 smaller models, all are recently launched):

  du -sh zai-org/GLM-5.1
  1.4T zai-org/GLM-5.1
  du -sh XiaomiMiMo/MiMo-V2.5-Pro 
  963G XiaomiMiMo/MiMo-V2.5-Pro
  du -sh deepseek-ai/DeepSeek-V4-Pro
  806G deepseek-ai/DeepSeek-V4-Pro

  du -sh XiaomiMiMo/MiMo-V2.5 
  295G XiaomiMiMo/MiMo-V2.5
  du -sh MiniMaxAI/MiniMax-M2.7
  215G MiniMaxAI/MiniMax-M2.7
  du -sh deepseek-ai/DeepSeek-V4-Flash
  149G deepseek-ai/DeepSeek-V4-Flash

That page mentions that the model is natively INT4 for most of the params, and 600GB is in the ballpark of what's available there for download.