Hacker News

https://docs.vllm.ai/en/v0.20.0/api/vllm/model_executor/laye...

`vllm.model_executor.layers.quantization.turboquant`

> The technique implemented here consists of the scalar case of the HIGGS quantization method (Malinovskii et al., "Pushing the Limits of Large Language Model Quantization via the Linearity Theorem", NAACL 2025; preprint arXiv:2411.17525): rotation + optimized grid + optional re-normalization, applied to KV cache compression. A first application of this approach to KV-cache compression is in "Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models" (Shutova et al., ICML 2025; preprint arXiv:2501.19392). Both these references pre-date the TurboQuant paper (Zandieh et al., ICLR 2026).

amitport 2 hours ago [ - ]

Those works did cite DRIVE/EDEN :)

HIGGS is an extension of EDEN (using the well known method for blockwise Lloyd-Max).

The proper framing of this "TurboQuant" layer in vllm (which does not include JQL) is precisely EDEN 22 without the scale correction.

kumarhn 23 minutes ago [ - ]

EDEN is clearly relevant prior work for HIGGS. But reducing HIGGS to “an extension of EDEN” seems unfair to the authors of HIGGS. Similar primitive, different problem setting, different constraints, different contribution.

Curious: where do you draw the line between “related prior work” and “an extension of EDEN”?

27 minutes ago [ - ]

[deleted]

3 hours ago [ - ]