Hacker News

From your second paper:

  > In particular, we can generate fixed random rotation matrices at initialization, and multiply them into the activations any time we read from or write to the residual stream.

I guess I was mistaken in assuming this part was part of the TurboQuant-specific innovations. Still an interesting concept though