Hacker News

And the upsides of using draft models for MOE models with so low number of active parameters (as here or as in the article) are quite low, compared to dense models where you can get enormous speedups. I would prefer running the dense 27b models with speculative decoding instead.