I think that's in the paper https://arxiv.org/pdf/2505.16932
In the paper they are kind enough to include the code so that I don't have to understand the math behind it, as it goes way over my head, and I don't know how to extend it to inverse square root
In the paper they are kind enough to include the code so that I don't have to understand the math behind it, as it goes way over my head, and I don't know how to extend it to inverse square root