I use 3 AI's (Claude, GPT and Gemini) to review each other's design plans and implementation on the same code base. Each often catches problems the others miss.
I try to make sure the architecture docs of the code base are refreshed regularly based on recent changes, so it's easier for humans and AI agents to make sense of the code.
I also regularly stop all other developments and just focus on auditing the code base with these AI's to make sure they are secure, robust, clean, and well structured and well tested -- some refactoring would be needed most of the time, and it's well worth it.
With this approach, nowadays I often merge code from AI without completely understanding what it's doing, but seems the code has been working so far. :)
You’ve transitioned from “individual contributor” to “manager”! (;->
Haha, true!
I do sometimes have to steer the discussions between the AI's to the right direction, if they deviate too far away from the real problem, either because they miss some context, or because my original description of the problem was misleading.
To do that formally, I have a mechanism built-in the review loop where if a comment on a github issue or PR is signed as "-- Human Reviewer", then all AI agents have to treat the comment as the highest priority item to address.
I'm always curious when I see these stories. How long have you been doing this, for what sort of work, and was the codebase mature before you began working like this?
Yeah, this one is easy: I have been doing this for half a year. I have a couple of projects worked out this way, all green-field projects, code base grew from 0 to tens of thousand of lines each.
That is interesting. Half a year is not nothing and I expect it's harder to keep a project functioning when the base is vibe coded rather than having mature abstractions and architecture already.
I am still skeptical on this method's ability to deliver polished products though. I've kept an eye out on it in the OSS world and don't think I've seen big anything yet.
This is the way. I use gh copilot and have opus interrogate me and write the plan, then gpt review the plan and provide feedback; repeat this multiple times until gpt is either satisfied or starts to nitpick on unimportant stuff. Then sanity check the plan myself and have gpt implement it.
Each implementation is also reviewed by me before merging to master. I complete PRs only when I'm satisfied with the implementation, my feedback is addressed, and I fully understand what is going on. Agents are the replacement for typing and productivity multipliers.
I have big picture view of the product, each plan implements only a part of it, scoped to avoid merging unreviwed slop. Probably slower, but result is much better.
Cool. Yeah it's important to have a big picture of the product, to steer the AI's towards the right direction in their work.