Wondering about Google Multi-Token prediction, why isn't this being implemented into every new major model ? Is the 750 token/s achieved using this technique ?

MTP or similar probably is being used on the backend, but that's transparent to the end user