> Thanks for that! It is worth noting that taking advantage of the post-rotation distribution

I again feel this claim is too strong. Rotations have been used in information theory/wireless communications for decades at this point, with appropriate scaling done at channel inputs/outputs to hit channel capacity. The signals then pass through the appropriate codebooks that take advantage of the post-rotated+whitened signal.

Our cellphones today are powered by such technology.

I agree with your claim when restricted to deep learning. But I do not agree with the broad characterization that taking advantage of post-rotation distributions was only first done in your work.

Thanks for the pushback, and I appreciate the reference to classical information theory.

While I probably overstated things by using the very general phrase "taking advantage," I want to be very precise about the claim, as I believe these works are foundational to quantization, beyond the scope of deep learning. The mechanism of applying a deterministic biased quantizer, such as Lloyd-Max, to the induced post-rotation distribution, alongside mathematically correcting its inherent bias, is a distinct contribution (which asymptotically improves the worst-case error).

If there is a classical paper that utilizes such a combination, I would genuinely be very eager to review it. But to my knowledge, this was not introduced prior to DRIVE and EDEN.

[deleted]