Reacting to the story itself, I've been on the same thought line but came to the opposite conclusion. Precisely because the generation of the code is unreliable, one of the metrics we will be using in the future to determine the value of the code is precisely how much it has been tested against the real world. Real-world tested code will always be more valuable than what has just been instantiated by an AI, and that extends indefinitely into the future because no AI will ever be able to completely deal with integrating with all the other AI-generated code in the world on the first try. That is, as AIs get better at generating code, we will inevitably generate more code with them, and then later code must deal with that increased amount of code. So the AIs can never "catch up" with code complexity because the problem gets worse the better they get.

This story is itself the explanation of why we're not going to go this route at scale. It'll happen in isolated places for the indefinite future. But farmers are going to buy systems, generated by AIs or not, that have been field tested, and will be no more interested in calling new untested code into being for their own personal use on their own personal farm than they are today.

The limiting factor for future code won't be how much AI firepower someone has to bring to bear on a problem but how much "real world" there is to test the code against, because there is only going to be so much "real world" to go around.

(Expanded on: https://jerf.org/iri/post/2026/what_value_code_in_ai_era/ ).