Hacker News

theanonymousone 5 hours ago [ - ]

In OpenRouter, there is an "int4" tag for Moonshot provider of Kimi K2. 7 Code. Isn't that too low, particularly coming from the very developer of the model? Os that a mistake? How is it in their direct API offer?

kouteiheika 5 hours ago [ - ]

The model is natively quantized (i.e. it was trained that way in the first place, so this is not a post-training quantization which degrades performance).

knollimar 2 hours ago [ - ]

Isn't it not completely quantized? I thought there were some dense parts but most is int4?

theanonymousone 4 hours ago [ - ]

But the huggingface link mentions BF16, F16, and I32?

kouteiheika 3 hours ago [ - ]

Not every weight is quantized. For example, those weights which don't take much space or are highly important are left in higher precision. State-of-art quantization of weights is never done uniformly (i.e. to all weights and in the same way).

zackangelo 2 hours ago [ - ]

I don't believe safetensors has a native int4 dtype, so they packed 4 int4s into a bf16 in this checkpoint.