And the upsides of using draft models for MOE models with so low number of active parameters (as here or as in the article) are quite low, compared to dense models where you can get enormous speedups. I would prefer running the dense 27b models with speculative decoding instead.
That is what I have learned, yes. Not tested the dense Qwen yet. IIRC the 31B Gemma was slow enough that I doubt MTP will help me much.