Great writeup. Are there any libraries that implement some of the methods described?
ScalarLM uses tokenformer adaptors by default, which have learnable key/values
https://www.scalarlm.com/blog/tokenformer-a-scalable-transfo...
ScalarLM uses tokenformer adaptors by default, which have learnable key/values
https://www.scalarlm.com/blog/tokenformer-a-scalable-transfo...