There are also more papers on similar themes.

For example, TurboQuant makes use of QJL (quantized Johnson Lindenstrauss transformations). One of the first papers to characterize the QJL and in fact the rate distortion tradeoff for quantized matrix multiplication in general is "Optimal Quantization for Matrix Multiplication" (https://arxiv.org/abs/2410.13780) by Ordentlich and Polyanskiy.

There is also a more accessible survey paper around quantized matrix multiplication called "High-Rate Quantized Matrix Multiplication: Theory and Practice" (https://arxiv.org/abs/2601.17187), by the same authors.

TurboQuant cites none of them.

TurboQuant is starting to look like a case study in how to turn a fragile paper into a breakthrough story.

The attribution is thin, the “6x compression” headline is not clearly separated from prior KV-cache quantization baselines like KIVI, and the RaBitQ comparison is hard to take seriously: single-core CPU for the baseline, A100 GPU for TurboQuant. It is comparing apples-to-datacenter. Worse, there are also public OpenReview comments saying that even the reported accuracy results are not reproducible.

Hard to believe this is the standard for something being promoted as a breakthrough. If this came from a random startup blog, people would be much harsher about it.

I believe our claim at this point is more fundamental than just lack of citation.

The quantizer in TurboQuant is EDEN quantization (2021) applied to the KV-cache. It is neither a novel quantizer nor an improvement in quantization techniques.

In DRIVE/EDEN, we already introduced the version used in "TurboQuant"'s paper and suggested an optimal scale configurations which are better in both mse-minimizing and unbiased scenarios.

Wow, yes - you are completely correct (read through the note in detail now).

Though, as your paper also notes, the quantizer values themselves aren't fundamentally novel to either paper. Lloyd Max scalar quantizers have been studied for a very, very long time. And the specific Lloyd Max values for the Gaussian input distribution have been obtained in many papers across signal processing and information theory.

Thanks for that!

It is worth noting that taking advantage of the post-rotation distribution was not actually done until DRIVE (2021), which was made possible via our proper scaling. Furthermore, applying a Lloyd-Max codebook post-rotation was introduced EDEN.

We consider these to be the foundational works in this regard.

> Thanks for that! It is worth noting that taking advantage of the post-rotation distribution

I again feel this claim is too strong. Rotations have been used in information theory/wireless communications for decades at this point, with appropriate scaling done at channel inputs/outputs to hit channel capacity. The signals then pass through the appropriate codebooks that take advantage of the post-rotated+whitened signal.

Our cellphones today are powered by such technology.

I agree with your claim when restricted to deep learning. But I do not agree with the broad characterization that taking advantage of post-rotation distributions was only first done in your work.

[deleted]