Is this newer/better than the speculative decoding from 2022? https://arxiv.org/abs/2211.17192
That paper is cited in the 'introduction' and 'background' sections. This paper is improving by removing some bottlenecks.
Seems like they focus on improving the drafter and the verification policy so speculation keeps producing net speedups rather than wasted verification work at deepseek scale.
That paper is cited in the 'introduction' and 'background' sections. This paper is improving by removing some bottlenecks.
Seems like they focus on improving the drafter and the verification policy so speculation keeps producing net speedups rather than wasted verification work at deepseek scale.