Multi-token prediction is a good enhancement to training. It isn't necessarily useful for inference. Other speculative decoding like EAGLE is. It is specific to the technology and the authors of these things write about it.