Can you use the smaller Gemma 4B model as speculative decoding for the larger 31B model?
Why/why not?
[dead]
[dead]