Could be amazing, but it's hard to judge if it will really work with say a 27 B model or larger. We can already get pretty good speed with a 2B model.
Could be amazing, but it's hard to judge if it will really work with say a 27 B model or larger. We can already get pretty good speed with a 2B model.
thanks! we explain how it scales to larger models in the last section the OP blog post
Shame you stopped short of actually benchmarking that scale though, eh?
will do - we are a small team and it takes time to implement and optimize a new model, whatever the size.