I think the other aspect to this which you allude to at the end is that all of these arguments start with the assumption that all human software engineers produce high quality code that meets the requirements, but obviously that’s very much not the case in the real world. After all, 80-90% of drivers rate themselves as above average.

If one compares a single competent software engineer directing a number of agents against a random group of engineers (not necessarily working at FAANG or a YC startup), then those quality arguments are going to be significantly less compelling.