There's some interesting work in NeurIPS this year on fused kernels for MoE too: https://flash-moe.github.io/