Just get the bigger models to figure out the architecture required for hot-swappable sub-experts without loss of performance!
Got all those tokens, isn’t that the point of auto research and friends??
(Only sort of joking).
Just get the bigger models to figure out the architecture required for hot-swappable sub-experts without loss of performance!
Got all those tokens, isn’t that the point of auto research and friends??
(Only sort of joking).