This will happen regardless. LLMs are already ingesting their own output. At the point where AI output becomes the majority of internet content, interesting things will happen. Presumably the AI companies will put lots of effort into finding good training data, and ironically that will probably be easier for code than anything else, since there are compilers and linters to lean on.

I've thought about this and wondered if this current moment is actually peak AI usefulness: the snr is high but once training data becomes polluted with it's own slop things could start getting worse, not better.