The wall is training data. An AI can't produce its own training data because an AI can't be smarter than its own training data. This is a well known regression problem and one I personally believe is not solvable. (A softer assertion would be: it's not solvable with current technology.)
I use to think this but no one I have read believes data is the problem.
Amodei explains that if data, model size and compute scale up linearly, then the reaction happens.
I don't understand why data wouldn't be a problem but it seems like if it was, we would have ran into this problem already and it has already been overcome with synthetic data.