thanks! we explain how it scales to larger models in the last section the OP blog post

Shame you stopped short of actually benchmarking that scale though, eh?

will do - we are a small team and it takes time to implement and optimize a new model, whatever the size.