I never look at code. It used to be that it quickly became unmaintainable spaghetti where the agent struggled to make any change at all, but in the past year (and with a three step plan/develop/review workflow), the quality is so good that I basically just don't look at the code any more.

It definitely has fewer bugs than a senior developer, but it really hinges on getting the plan right. 20 minutes of planning and 20 of implementation sounds about right for my workflow as well, just make sure you have GPT as a reviewer. It's very nitpicky and finds lots of bugs.

This brings to mind two thoughts:

First, that this is challenging to scale across large orgs. Even if your plans produce high quality code, that isn’t true for everyone. I’m definitely struggling with slop code being collectively mailed to me for review my our 1,000 engineers that were told to use their AI subscription all at once.

I feel like we should be taking “prompt engineering” more seriously. And when people mail me code to review, it should also include the agentic workflow and plan. So that when code isn’t up to quality, and can have a discussion about the prompts used to generate it.

My second thought is related to your senior engineer comment. This isn’t surprising, because in most engineering orgs, seniority is completely unrelated to code quality. In fact, many orgs incentive the opposite: “senior” devs that push out buggy code quickly and push accountability downhill to the junior devs.

Eh, everything is challenging to scale across large orgs. Even before LLMs, the code was a huge ball of spaghetti that barely held together. Now we just get there faster.

About senior engineers, I guess that depends on the org you have experience with. My experience doesn't match yours.