Solo founder here, shipping a real product built mostly with AI. The code review thing is real but my actual daily pain is different. AI lies about being done. It'll say "implemented" and what it actually did is add a placeholder with a TODO comment. Or it silently adds a fallback path that returns hardcoded data when the real API fails, and now your app "works" but nothing is real.
I've also given it explicit rules like "never use placeholder images, always generate real assets" — and it just... ignores them sometimes. Not always. Sometimes. Which is worse, because you can't trust it but you also can't not use it.
The 80% it writes is fine. The problem is you still have to verify 100% of it.
Have you tried using an additional agent to verify the outputs? It seems that can help if the supervising agent has a small context demand on it. (ie. run this command, make sure it returns 0, invoke main coding agent with error message if it doesn't)
Yeah I've experimented with that pattern. The meta-agent approach works for catching obvious stuff, like "did the build pass" or "does this file actually exist." But the harder bugs are semantic. The agent writes a function that returns the right shape of data but with wrong values, or adds a fallback that masks the real failure. A supervising agent reading the same code often has the same blind spots.
What's worked better for me is building verification into the workflow itself, like explicit test assertions the agent has to pass before it can claim "done," plus a rule that any API call must show a real response, not a mock. Basically treating the AI like a junior dev who needs guard rails, not a senior who just needs a code review.