Yeah important conceptually to remember MTP is kind of just more weights, but speculative decoding is the runtime algorithm that’s a significant add to whatever code is serving the model.

That is.. inaccurate.

How so? I’m not saying most of work doesn’t go into creating the drafting model or enabling a new head on the primary model, but the point is that however cool it is the result is, more weights. Speculative decoding requires code to be aware of how this works at the inference level.