Could be amazing, but it's hard to judge if it will really work with say a 27 B model or larger. We can already get pretty good speed with a 2B model.

thanks! we explain how it scales to larger models in the last section the OP blog post

Shame you stopped short of actually benchmarking that scale though, eh?

will do - we are a small team and it takes time to implement and optimize a new model, whatever the size.