> Tests (as usually written, in unit-test form) only tell you that it's not completely broken, they're not a good indicator of it working well otherwise "vibecoded slop" wouldn't be a thing
You can certainly end up with vibecoded slop that passes all the tests, but it won't pass other forms of evaluation (necessarily true, otherwise you could not identify it as vibecoded slop.)
> The same could be said for human CEOs. A lot of them don't really have good success rates either.
This is part of my point. The tight feedback loop that enables us to judge a model's efficacy in software, doesn't exist for the role of CEO.