Not necessarily with speculative decoding. Whitespace would be trivial to predict and they would petty much keep using the same amount of compute as before.
I don't think that's their primary motive for doing this but it is a side effect.
Not necessarily with speculative decoding. Whitespace would be trivial to predict and they would petty much keep using the same amount of compute as before.
I don't think that's their primary motive for doing this but it is a side effect.