The question that I haven't seen answered yet is whether or not we will reach a sort of "peak vibe coding" phase. What I mean by that is, right now, LLM's are somewhat decent at writing workable code. That code, however, needs babysitting to keep from going off the rails. And that code is sourced from training, which has been gleaned from the billions of lines of code written by hackers everywhere and pushed to source control.

We currently have engineers competent enough to use an LLM, review the code written, and fix the places where the LLM writes poor code. We also still have engineers pushing novel code themselves. That means we are on the up-slope. Right now, nascent hackers are learning perhaps the old ways, but also are for sure paying attention to and using vibe coding. That creates a negative feedback loop. As greybeards age out of programming, so to does the knowledge foundation that allowed LLM training to take place in the first place, and more importantly, that trained the next generation of hackers. AI is going to increasingly begin consuming AI code, and I haven't seen solid evidence yet that it is capable (at least currently) of putting truly novel ideas into code.

There will be an inflection point where AI's are consuming their own output more than that from competent hackers, and that's when things will go downhill unless there is a significant breakthrough in actual reasoning in AI.

The first generation of AlphaGo was trained on human-human games.

The second genration removed that, was trained entirely on computer generated games.

Exactly because human data is running out, synthetic data is very big right now in all AI labs.

This has been my suspicion since LLMs began eating the Internet. Whether it's code or writing, now that LLMs are consuming their own output, the Habsburg Jaw[1] is going to quickly become evident. It is very difficult--sometimes impossible--to know whether a given chunk of input is wholly or partially generated by an LLM. Nevertheless, filtering input may become a critical task. That expense will be passed to the consumer, and LLM prices will necessarily rise as their quality diminishes. It could become a death spiral.

If so, I, for one, will be relieved. I'm tired of LLMs trying to take over the enjoyable parts of writing and coding, and leaving the menial tasks to us humans.

[1] https://www.smithsonianmag.com/smart-news/distinctive-habsbu...

Nothing I've seen from the AI labs appears to indicate that they are worried about model collapse in the slightest.

That makes sense to me, because if their models start getting worse because there's slop in the training data they can detect that and take steps to fix it.

Their entire research pipeline is about finding what makes models that score better! Why would they keep going with a technique that scored worse?

> Nothing I've seen from the AI labs appears to indicate that they are worried about model collapse in the slightest.

AI labs are insufferable hype machines, they are unlikely to sow doubt about their own business models.

> they can detect that and take steps to fix it.

Each model will need an endless diet of new content to remain relevant, and over time, avoiding ingestion of LLM output (and the accompanying inbreeding depression) will likely be a tricky proposition. Not impossible, but expensive and error-prone.