If you spend 5-15x the time reviewing what the LLM is doing, are you saving any time by using it?

No, but that's the crux of the AI problem in software. Time to write code was never the bottleneck. AI is most useful for learning, either via conversation or by seeing examples. It makes writing code faster too, but only a little after you take into account review. The cases where it shines are high-profile and exciting to managers, but not common enough to make a big difference in practice. E.g AI can one-shot a script to get logs from a paginated API, convert it to ndjson, and save to files grouped by week, with minimal code review, but only if I'm already experienced enough to describe those requirements, and, most importantly, that's not what I'm doing every day anyway.

I'm finding it in some cases I'm dealing with even more code given how much code AI outputs. So yeah, for some tasks I find myself extremely fast but for others I find myself spending ungodly amounts of time reviewing the code I never wrote to make sure it doesn't destroy the project from unforseen convincing slop.

A related Dirty Secret that's going to become clear from all this is that a very large proportion of code in the wild (yes, even in 2026—maybe not in FAANG and friends, IDK, but across all code that is written for pay in the entire economy) has limited or no automated test coverage, and is often being written with only a limited recorded spec that's usually fleshed out only to the degree needed (very partial) as a given feature is being worked on.

What do the relatively hands-off "it can do whole features at a time" coding systems need to function without taking up a shitload of time in reviews? Great automated test coverage, and extensive specs.

I think we're going to find there's very little time-savings to be had for most real-world software projects from heavy application of LLMs, because the time will just go into tests that wouldn't otherwise have been written, and much more detailed specs that otherwise never would have been generated. I guess the bright-side take of this is that we may end up with better-tested and better-specified software? Though so very much of the industry is used to skipping those parts, and especially the less-capable (so far as software goes) orgs that really need the help and the relative amateurs and non-software-professionals that some hope will be able to become extremely productive with these tools, that I'm not sure we'll manage to drag processes & practices to where they need to be to get the most out of LLM coding tools anyway. Especially if the benefit to companies is "you will have better tests for... about the same amount of software as you'd have written without LLMs".

We may end up stuck at "it's very-aggressive autocomplete" as far as LLMs' useful role in them, for most projects, indefinitely.

On the plus side for "AI" companies, low-code solutions are still big business even though they usually fail to deliver the benefits the buyer hopes for, so there's likely a good deal of money to be made selling companies LLM solutions that end up not really being all that great.

> better-specified software

Code is the most precise specification we have for interfacing with computers.

There are some cases where AI is generating binary machine code, albeit small amounts. What do we have when we don't have the code?

Machine code is still code, even if the representation is a bit less legible than the punch cards we used to use.

Bingo. Hopefully there are some business opportunities for us in that truth.

Re. productivity, if LLM's are a genuine boost with 1/3 of the work, neutral 1/3 of the time, and actually worse 1/3 of the time, it's likely we aren't really seeing performance improvements as 1) people are using them for everything and b) we're still learning how to best use them.

So I expect over time we will see genuine performance improvements, but Amdahl's law dictates it won't be as much as some people and ceo's are expecting.

> because the time will just go into tests that wouldn't otherwise have been written

Writing tests to ensure a program is correct is the same problem as writing a correct program.

Evaluating conformance is a different category of concern from ensuring correctness. Tests are about conformance not correctness.

Ensuring correct programs is like cleaning in the sense that you can only push dirt around, you can't get rid of it.

You can push uncertainty around and but you can't eliminate it.

This is the point of Gödel's theorem. Shannon's information theory observes similar aspects for fidelity in communication.

As Douglas Adams noted: ultimately you've got to know where your towel is.

These companies don't care about saving time or lowering operating costs, they have massive monopolies to subsidize their extremely poor engineering practices with. If the mandate is to force LLM usage or lose your job, you don't care about saving time; you care about saving your job.

One thing I hope we'll all collectively learn from this is how grossly incompetent the elite managerial class has become. They're destroying society because they don't know what to do outside of copying each other.

It has to end.

To be honest, some times it's still beneficial.

For fairly straightforward changes it's probably a wash, but ironically enough it's often the trickier jobs where they can be beneficial as it will provide an ansatz that can be refined. It's also very good at tedious chores.

And spotting stuff in review! Sometimes it’s false positives but on several occasions I’ve spent ~15-30 minutes teaching-reviewing a PR in person, checked afterwards and it matched every one of the points.

Some, but not very much. Writing code is hard. Ai will do a lot of tedious code that you procrastinate writing.

Also when you are writing code yourself you are implicitly checking it whilst at the back of your mind retaining some form of the entire system as a whole.

People seem to gloss over this... As a CEO if people don't function like this I'd be awake at night sweating.

That’s the reverse-centaur issue I see: humans are not great at repetitive nuanced similar seeming tasks, putting the onus on humans to retroactively approve high volumes of critical code has them managing a critical failure mode at their weakest and worst. Automated reviews should be enhancing known good-faith code, manual reviews of high volume superficially sound but subversive code is begging for issues over time.

Which results the software engineering issue I’m not seeing addressed by the hype: bugs cost tens to hundreds of times their coding cost to resolve if they require internal or external communication to address. Even if everyone has been 10x’ed, the math still strongly favours not making mistakes in the first place.

An LLM workflow that yields 10x an engineer but psychopathically lies and sabotages client facing processes/resources once a quarter is likely a NNPP (net negative producing programmer), once opportunity and volatility costs are factored in.

I have found ai extreemly good at finding all those really hard bugs though. Ai is a greater force multiplier when there is a complex bug than in gneen field code.

> Even if everyone has been 10x’ed, the math still strongly favours not making mistakes in the first place

The math depends on importance of the software. A mistake in a typical CRUD enterprise app with 100 users has zero impact on anything. You will fix it when you have time, the important thing is that the app was delivered in a week a year ago and was solving some problem ever since. It has already made enormous profit if you compare it with today’s (yesterday’s ?) manual development that would take half a year and cost millions.

A mistake in a nuclear reactor control code would be a total different thing. Whatever time savings you made on coding are irrelevant if it allowed for a critical bug to slip through.

Between the two extremes you thus have a whole spectrum of tasks that either benefit or lose from applying coding with LLMs. And there are also more axes than this low to high failure cost, which also affect the math. For example, even non-important but large app will likely soon degrade into unmanageable state if developed with too little human intervention and you will be forced to start from scratch loosing a lot of time.

Sortof. I work on a system too large for anyone to know the whole thing. Often people who don't know each other do something that will break the other. (Often because of the number of different people - most individuals go years between this)

No I’m keeping up with the system as a whole because I’m always working at a system level when I’m using AI instead of worrying about the “how”

No you’re not. The “how” is your job to understand, and if you don’t you’ll end up like the devs in the article.

We as an industry have been able to offload a lot of “how” via deterministic systems built by humans with expert understanding. LLMs give you the illusion of this.

No in my case the “how” is

1. I spoke to sales to find out about the customer

2. I read every line of the contract (SOW)

3. I did the initial requirements gathering over a couple of days with the client - or maybe up to 3 weeks

3. I designed every single bit of AWS architecture and code

4. I did the design review with the client

5. I led the customer acceptance testing

> We as an industry have been able to offload a lot of “how” via deterministic systems built by humans with expert understanding. LLMs

I assure you the mid level developers or god forbid foreign contractors were not “experts” with 30 years of coding experience and at the time 8 years of pre LLM AWS experience. It’s been well over a decade - ironically before LLMs - that my responsibility was only for code I wrote with my own two hands