Just get the bigger models to figure out the architecture required for hot-swappable sub-experts without loss of performance!

Got all those tokens, isn’t that the point of auto research and friends??

(Only sort of joking).