Hacker News

maxiniol 21 hours ago [ - ]

Wondering about Google Multi-Token prediction, why isn't this being implemented into every new major model ? Is the 750 token/s achieved using this technique ?

adam_arthur 21 hours ago [ - ]

MTP or similar probably is being used on the backend, but that's transparent to the end user