> Does AI make incredibly inefficient code most of the time? Yup. But it does it at lightspeed with minimal effort.

This hits the nail in the head.

Detractors often hang on to examples of coding assistants making mistakes or output subpar code, but they somehow miss the fact that coding assistants can also be prompted again and refactor whole swaths of code just as fast as they introduce oopsies. This means that the worst case scenario implies fast convergence to an acceptable outcome, and from there also fast iteration to improve upon that.

The problem is that this approach is not sustainable. Errors compound. The cost to fix one issue might seem small at first, but over a stretch of time all these "oopsies" become architectural spaghetti that can only be fixed with a complete rewrite, which will certainly become more expensive than getting the code "organically" developed.

The only way I see AI coding working in the long run is if we go back to a Waterfall/BDUF process and having actual engineering. Let engineers really own the architecture. Enforce that any new feature - no matter how small - to be specced out with complete sequence diagrams. Ensure that every new software package needs to be put on an UML component diagram for the team to review and see each addition interacts with the whole system, etc.

If we do that, then we can just give all the documents to a coding agent and say "go ahead and implement this" with a minimal amount of confidence. But in doing this, I bet we will realize the following:

    - the "effort" has never been about writing code itself. The code is just the material manifest of all the thought that went to think over a solution into the problems that the product is attempting to solve.

   - we will likely be better off by using code generation tools (i.e, UML-to-code) and a "weak" LLM (than can run locally) than by playing the token lottery at the Anthropic Casino.

I mirror your thoughts. I think we'll end up with "perfect map" paradox = you cannot be vague or indecisive on what you want (and if you are then these decisions don't matter) and you're creating a 1:1 representation of what the code needs to be.

I'd substitute "owner" for the team and in that sense the owner will not need to be human.

We're at this state where Claude is great at doing the "middle" part of work, but it's crap at gathering requirements and verification of what it has done. I also don't see people caring about these aspects of software development as shown in the article

I haven’t used Fable/Mythos yet, but my experience with recent version of Opus, GPT 5.5 and recent Chinese models is that promoting again isn’t guaranteed to fix the underlying issues, nor is it guaranteed to not introduce more issues. I’ve seen SOTA models make ridiculously stupid architectural decisions that they were then unable to back out of without being prompted very specifically, instead adding a patchwork of “fixes” on top.

I’m not saying that you can’t use AI to do it because I believe that with carefully controlled workflows and context management you can, but it’s not a simple prompt away, it’s requires guidance and understanding, and isn’t the speed demon that raw prompting is.

> I haven’t used Fable/Mythos yet, but my experience with recent version of Opus, GPT 5.5 and recent Chinese models is that promoting again isn’t guaranteed to fix the underlying issues, nor is it guaranteed to not introduce more issues.

That's not really the point though. That presumes models are only useful if they are one-shot models. That is false.

I mean, what if your prompt successfully changes 20 source files and makes a mess in one? How much work did it saved?

And the elephant in the room is when models actually outperform whatever the prompter is able to deliver, and faster. That is somehow left out.

> That presumes models are only useful if they are one-shot models

That’s not at all what I’m saying.

I’m saying that in my experience across multiple models, the follow up prompts don’t fix prior underlying issues. They usually patch on top instead, unless you give them significant and time consuming guidance.

I want them to be more useful outside of one-shot uses, but I find that they currently miss the mark.

I think this is overlooking the fact that assigning a coding assistant to fix the bugs it re-introduces for all eternity just leads to spiraling token costs, which might cost more than just hiring a competent engineer in the first place.

Don't forget that you can adjust your requirements (either via plan or skill) to ensure the mistakes do not happen. The problem is that neither LLMs, nor humans (that don't work with the domain) will know they made these mistakes. Even coders don't think about everything all the time

> Don't forget that you can adjust your requirements (either via plan or skill) to ensure the mistakes do not happen.

No, you can't. Adjusting prompts ensures absolutely nothing.

I disagree. What I should have added is that with agents (as well as humans) you do need to have tests that verify what was done.

In my experience, the refactors are just as bad, just in different ways. All you end up doing is treading water with different iterations of shitty code. By the time you get somewhere acceptable, you could've just fixed it up yourself.

My preferred workflow these days is to pair program with an LLM until it gets close-ish and then manually touch it up. Without that, it just produces junk in different forms.