Would MoE models work better with this approach?