Hacker News

WhitneyLand 21 hours ago [ - ]

Yeah important conceptually to remember MTP is kind of just more weights, but speculative decoding is the runtime algorithm that’s a significant add to whatever code is serving the model.

HumanOstrich 21 hours ago [ - ]

That is.. inaccurate.

WhitneyLand 18 hours ago [ - ]

How so? I’m not saying most of work doesn’t go into creating the drafting model or enabling a new head on the primary model, but the point is that however cool it is the result is, more weights. Speculative decoding requires code to be aware of how this works at the inference level.