I think the obvious solution here is to beef up the test side of the app, much more than when writing code by hand. Tests represent project knowledge in executable format. The LLM does not need to be careful to remember every detail of the tests. You don't need to vet every small interaction, it automates review work as well.
Even better if the project was built from the start to be easier to test and observe. But my golden rule remains - no code without tests, expand test suite all the time.
I agree, human-steered, AI-implemented test cases can at least capture the acceptance criteria.
It's then more efficient to inspect if existing test cases are being modified as part of the delivery of something new and inspect why.