Surely a well written spec would include functional requirements like resilience and performance?

However I agree that's the hard part. I can write a spec for finding the optimal solution to some combinatorial problem - where the naive code is trivial - a simple recursive function for example - but such a function would use near infinite time and memory.

In terms of the ML programme really being a compiler - isn't that in the end true - the ML model is a computer programme taking a spec as input and generating code as output. Sounds like a compiler to me.

I think the point of the AK post is to say the challenge is in the judging of solutions - not the bit in the middle.

So to take the writing software problem - if we had already sorted the computer programme validation problem there wouldn't be any bugs right now - irrespective of how the code was generated.