Hacker News

This sounds like it's proposing that what's happening during large model training is a little bit akin to genetic algorithms: many small networks emerge and there is a selection process, some get fixed, and the rest fade and are then repurposed/drifted into other roles, repeat.