Hacker News

new | ask | show | jobs

Balinares 2 months ago [ - ]

Isn't that exactly how draft models speed up inference, though? Validating a batch of tokens is significantly faster than generating them.