ScalarLM uses tokenformer adaptors by default, which have learnable key/values

https://www.scalarlm.com/blog/tokenformer-a-scalable-transfo...