"I push very high test coverage on all my projects (85%+)"

Coverage doesn't matter if the tests aren't good. If you're not verifying the tests are actually doing something useful, talking about high coverage is just wanking.

"have the agent create progressively bigger integration tests, until I hit e2e/manual validation."

Same thing. It doesn't matter how big the tests are if they're not testing the right thing. Also why is e2e slashed with manual? Those are orthogonal. E2E tests can [and should] be fully automated for many [most?] systems. And manual validation doesn't have to wait for full e2e.