Sure, if all you ask it to do is fix bugs. You can also ask it to work on code health things like better organization, better testing, finding interesting invariants and enforcing them, and so on.
It's up to you what you want to prioritize.
Sure, if all you ask it to do is fix bugs. You can also ask it to work on code health things like better organization, better testing, finding interesting invariants and enforcing them, and so on.
It's up to you what you want to prioritize.
I have some healthy skepticism on this claim though. Maybe, but there will be a point of diminishing returns where these refactors introduce more problems than they solve and just cause more AI spending.
Code is always a liability. More code just means more problems. There has never been a code generating tool that was any good. If you can have a tool generate the code, it means you can write something on a higher level of abstraction that would not need that code to begin with.
AI can be used to write this better quality / higher level code. That's the interesting part to me. Not churning out massive amounts of code, that's a mistake.
"What can we do to reduce the size of the codebase" seems like an interesting prompt to try.
There's an interesting phenomenon I noticed with the "skeptics". They're constantly using what-ifs (aka goalpost moving), but the interesting thing is that those exact same what-ifs were "solved" earlier, but dismissed as "not good enough".
This exact thing about optimisation has been shown years ago. "Here's a function, make it faster". With "glue" to test the function, and it kinda worked even with GPT4 era models. Then came alphaevolve where google found improvements in real algorithms (both theoretical i.e. packing squares and practical i.e. ML kernels). And yet these were dismissed as "yeah, but that's just optimisation, that's easyyyy. Wake me up when they write software from 0 to 1 and it works".
Well, here we are. We now have a compiler that can compile and boot linux! And people are complaining that the code is unmaintainable and that it's slow / unoptimised. We've gone full circle, but forgot that optimisation was easyyyy. Now it's something to complain about. Oh well...
I use LLM’s daily and agents occasionally. They are useful, but there is no need to move any goal posts; they easily do shit work still in 2026.
All my coworkers use agents extensively in the backend and the amount of shit code, bad tests and bugs has skyrocketed.
Couple that with a domain (medicine) where our customer in some cases needs to validate the application’s behaviour extensively and it’s a fucking disaster —- very expensive iteration instead of doing it well upfront.
I think we have some pretty good power tools now, but using them appropriately is a skill issue, and some people are learning to use them in a very expensive way.
Microsoft will be an excellent real-world experiment on whether this is any good. We so easily forget that giant platform owners are staking everything on all this working exactly as advertised.
Some of my calculations going forward will continue to be along the lines of 'what do I do in the event that EVERYTHING breaks and cannot be fixed'. Some of my day job includes retro coding for retro platforms, though it's cumbersome. That means I'll be able to supply useful things for survivors of an informational apocalypse, though I'm hoping we don't all experience one.
I agree but want to interject that "code organization " won't matter for long.
Programming Languages were made for people. I'm old enough to have programmed in z80 and 8086 assembler. I've been through plenty of prog.langs. through my career.
But once building systems become prompting an agent to build a flow that reads these two types of excels, cleans them,filters them, merges them and outputs the result for the web (oh and make it interactive and highly available ) .
Code won't matter. You'll have other agents that check that the system is built right, you'll have agents that test the functionality and agents that ask and propose functionality and ideas.
Most likely the Programming language will become similar to the old Telegraph texts (telegrams) which were heavily optimized for word/token count. They will be optimized to be LLM grokable instead of human grokable.
Its going to be amazing.
What you’re describing is that we’d turn deterministic engineering into the same march of 9s that FSD and robotics are going through now - but for every single workflow. If you can’t check the code for correctness, and debug it, then your test system must be absolutely perfect and cover every possible outcome. Since that’s not possible for nontrivial software, you’re starting a march of 9s towards 100% correctness of each solution.
That accounting software will need 100M unit tests before you can be certain it covers all your legal requirements. (Hyperbole but you get the idea) Who’s going to verify all those tests? Do you need a reference implementation to compare against?
Making LLM work opaque to inspection is kind of like pasting the outcome of a mathematical proof without any context (which is almost worthless AFAIK).
Will you trust code like this to run airplanes?
Remember, even Waymo has a ton of non-AI code it is built upon. We will still have PyTorch, embedded systems software, etc.
There are certainly people working on making this happen. As a hobbyist, maybe I'll still have some retro fun polishing the source code for certain projects I care about? (Using our new power tools, of course.)
Your assuming that scrum/agile/management won't take this over?
What stakeholder is prioritizing any of those things and paying for it out of their budget?
Code improvement projects are the White Whale of software engineering - obsessed over but rarely from a business point of view worth it.
The costs for code improvement projects have gone down dramatically now that we have power tools. So, perhaps it will be considered more worthwhile now? But how this actually plays out for professional programming is going to depend on company culture and management.
In my case, I'm an early-retired hobbyist programmer, so I control the budget. The same is true for any open source project.
And what happens when these different objectives conflict or diverge ? Will it be able to figure out the appropriate trade-offs, live with the results and go meta to rethink the approach or simply delude itself ? We would definitely lose these skills if it continues like this.