Is there an error in the visualization? It shows that every vector is rotated the same amount. My understanding was that they are randomized with different values, which results in a predictable distribution, which is easier to quantize.
That's actually correct and intentional. TurboQuant applies the same rotation matrix to every vector. The key insight is that any unit vector, when multiplied by a random orthogonal matrix, produces coordinates with a known distribution (Beta/arcsine in 2D, near-Gaussian in high-d). The randomness is in the matrix itself (generated once from a seed), not per-vector. Since the distribution is the same regardless of the input vector, a single precomputed quantization grid works for everything. I've updated the description to make this clearer.
Thanks. However, from this visualization it's not clear how the random rotation is beneficial. I guess it makes more sense on higher dimensional vectors.
I believe they are all rotated by the same random matrix, the purpose being (IIUC) to distribute the signal evenly across all dimensions. So effectively it drowns any structure that might be present in noise. That's essential for data efficiency in addition to avoiding bias related issues during the initial quantization step. However there are still some other issues due to bias that are addressed by a second quantization step involving the residual.
That said, I don't believe the visualization is correct. The grid for one doesn't seem to match what's described in the paper.
Also it's entirely possible I've misunderstood or neglected to notice key details.
“””
For the full technical explanation with equations, proofs, and PyTorch pseudocode, see the companion post: TurboQuant: Near-Optimal Vector Quantization Without Looking at Your Data.“
Yes. Great catch. I simplified the grid just for visualization purpose.
I've updated the visualization. The grid is actually not uniformly spaced. Each coordinate is quantized independently using optimal centroids for the known coordinate distribution. In 2D, unit-circle coordinates follow the arcsine distribution (concentrating near ±1), so the centroids cluster at the edges, not the center.
Yeah that's odd. It seems like you'd want an n-1 dimensional grid on the surface of the unit sphere rather than an n dimensional grid within which the sphere resides.
Looking at the paper (https://arxiv.org/abs/2504.19874) they cite earlier work that does exactly that. They object that grid projection and binary search perform exceptionally poorly on the GPU.
I don't think they're using a regular grid as depicted on the linked page. Equation 4 from the paper is how they compute centroids for the MSE optimal quantizer.
Why specify MSE optimal you ask? Yeah so it turns out there's actually two quantization steps, a detail also omitted from the linked page. They apply QJL quantization to the residual of the grid quantized data.
My description is almost certainly missing key details; I'm not great at math and this is sufficiently dense to be a slog.
Is there an error in the visualization? It shows that every vector is rotated the same amount. My understanding was that they are randomized with different values, which results in a predictable distribution, which is easier to quantize.
That's actually correct and intentional. TurboQuant applies the same rotation matrix to every vector. The key insight is that any unit vector, when multiplied by a random orthogonal matrix, produces coordinates with a known distribution (Beta/arcsine in 2D, near-Gaussian in high-d). The randomness is in the matrix itself (generated once from a seed), not per-vector. Since the distribution is the same regardless of the input vector, a single precomputed quantization grid works for everything. I've updated the description to make this clearer.
Thanks. However, from this visualization it's not clear how the random rotation is beneficial. I guess it makes more sense on higher dimensional vectors.
I believe they are all rotated by the same random matrix, the purpose being (IIUC) to distribute the signal evenly across all dimensions. So effectively it drowns any structure that might be present in noise. That's essential for data efficiency in addition to avoiding bias related issues during the initial quantization step. However there are still some other issues due to bias that are addressed by a second quantization step involving the residual.
That said, I don't believe the visualization is correct. The grid for one doesn't seem to match what's described in the paper.
Also it's entirely possible I've misunderstood or neglected to notice key details.
Awesome! So it nudges the vectors into stepped polar rays.. It's effectively angle snapping? Plus a sort of magnitude clustering.
Good post but link at the end is broken.
“”” For the full technical explanation with equations, proofs, and PyTorch pseudocode, see the companion post: TurboQuant: Near-Optimal Vector Quantization Without Looking at Your Data.“
Author here. Sorry still working on refining the post. Will share once the post is ready.
I like the visualization, but I don’t understand the grid quantization. If every point is on the unit circle aren’t all the center grid cords unused?
Yes. Great catch. I simplified the grid just for visualization purpose.
I've updated the visualization. The grid is actually not uniformly spaced. Each coordinate is quantized independently using optimal centroids for the known coordinate distribution. In 2D, unit-circle coordinates follow the arcsine distribution (concentrating near ±1), so the centroids cluster at the edges, not the center.
Yeah that's odd. It seems like you'd want an n-1 dimensional grid on the surface of the unit sphere rather than an n dimensional grid within which the sphere resides.
Looking at the paper (https://arxiv.org/abs/2504.19874) they cite earlier work that does exactly that. They object that grid projection and binary search perform exceptionally poorly on the GPU.
I don't think they're using a regular grid as depicted on the linked page. Equation 4 from the paper is how they compute centroids for the MSE optimal quantizer.
Why specify MSE optimal you ask? Yeah so it turns out there's actually two quantization steps, a detail also omitted from the linked page. They apply QJL quantization to the residual of the grid quantized data.
My description is almost certainly missing key details; I'm not great at math and this is sufficiently dense to be a slog.
i think grid can be a surface of the unit sphere