Hacker News

If you don't mind a stupid question, is this essentially dynamic quantization? I'm trying to understand how this is different from using a regular quantized model to squeeze more parameters into less RAM.