Hacker News

> In a year or so

Look at the best models from Spring 2025, and compare with now (and similarly for Springs 2024 and 2025). Armstrong and lots of others are betting that this trend will continue, and if it does, the LLMs will ship code the LLMs understand, and whether any human specifically understands any particular part will mostly not matter.

hn_throwaway_99 13 hours ago [ - ]

> the LLMs will ship code the LLMs understand, and whether any human specifically understands any particular part will mostly not matter.

I find this particularly funny. There were more than a couple Star Trek Episodes where some alien planet depends on some advanced AI or other technology that they no longer understand, and it turns out the AI is actually slowly killing them, making them sterile, etc. (e.g. https://en.wikipedia.org/wiki/When_the_Bough_Breaks_(Star_Tr... )

Sure, Star Trek is fiction, but "humans rely on a technology that they forget how to make" is a pretty recurrent theme in human history. The FOGBANK saga was pretty recent: https://en.wikipedia.org/wiki/Fogbank

It just amazes me that people think "Sure, this AI generated code is kinda broken now, but all we need is just more AI code to fix it at some unknowable point in the future because humans won't be able to understand it!"

randallsquared 12 hours ago [ - ]

If you'd told me 20-30 years ago we'd actually get the Star Trek computer in the mid-2020s and it still wouldn't be actually AGI, I would have thought that very strange and unlikely, so who knows?

snapcaster 43 minutes ago [ - ]

So nothing about the last 3 years has caused you to update your beliefs on this stuff? feels like bitter cope

pron 14 hours ago [ - ]

And if the trend doesn't continue? I understand that a company with Coinbase's performance has little to lose and not many options, but many companies are in a better position.

The problem is that executives could take the 15-20% productivity boost and be content, but they read stuff like this, get greedy, and they don't understand the risk they're taking.

atonse 13 hours ago [ - ]

Even if the trend doesn’t continue, the current models are very very good. They’re better than the average programmer in the industry, already.

pron 12 hours ago [ - ]

I don't know how anyone who carefully and closely reviews their output could possibly think that. Much of the time their code is fine, but every now and again they make a catastrophic (though often well-hidden) mistake that is so bad that all the tests pass but the codebase will be bricked if enough of those go in. They make such disastrous mistakes frequently enough that a decent-sized codebase can't last for more than 18-24 months.

If the average programmer is this bad, then there must be better-than-average programmers reviewing the code. The problem with agents is that they can produce code at a far higher volume than the average programmer.

Anyway, I don't know how well the average programmer programs, but if you commit agent-generated code without careful review, your codebase will be cooked in a year or two.

zeroonetwothree 13 hours ago [ - ]

Maybe at some coding benchmark. Certainly not at actually shipping and maintaining production grade software.

randallsquared 14 hours ago [ - ]

Agreed! That will be an... "interesting" outcome, if so, for a lot of these companies.

bix6 14 hours ago [ - ]

> and whether any human specifically understands any particular part will mostly not matter.

This is how I feel. It’s building things for me that work. I don’t care how it works under the hood in many cases.

pron 14 hours ago [ - ]

It's not about caring how it works. It's about caring that it keeps working at all even after you add stuff to it for a year or three (and nearly all software written by companies is software they evolve).

bix6 14 hours ago [ - ]

And who’s to say it won’t? It’s working now. I’m adding stuff and it’s still working. Why won’t that continue in year 3?

pron 13 hours ago [ - ]

If you carefully read the agent's output you'll see why. It adds layers upon layers of workarounds and defences that hide serious problems, until the codebase reaches a point where the agent can no longer understand it and work with it. All the tests pass right up until the moment when adding a feature or fixing a bug causes another bug, and then nothing and no one can save the codebase anymore.

qingcharles 13 hours ago [ - ]

Maybe a year ago? Right now the LLMs I mainly use (GPT5.5, Opus 4.7) will intuit exactly what I need from my brief specs and universally go above-and-beyond in creating code that is not only extremely high-quality, but catches a ton of the gotchas I would have stumbled on, in advance.

Just a minute ago 5.5 looked at some human-written code of mine from last year and while it was making the changes I asked for it determined the existing code was too brittle (it was) and rewrote it better. It didn't mention this in its summary at the end, I only know because I often watch the thinking output as it goes past before it hides it all behind a pop-open.

s__s 12 hours ago [ - ]

Interesting that we’ve have such different experiences. I was working with both those models today and on several occasions it proposed some pretty poor solutions.

I also find I need to run an llm code review or two against any code it produces to even get to the point where’s it’s ready for human review.

In any case they served as an extremely valuable tool.

pron 11 hours ago [ - ]

I use GPT 5.5. Sometimes it does what you say. It certainly finds silly mistakes in my code better than I could. But frequently enough it makes catastrophic architectural mistakes in its own code.

titularcomment 13 hours ago [ - ]

Maintaining software is like 80% of the job.

techblueberry 13 hours ago [ - ]

Because the API’s it uses will change? Nothing in tech is static. And that’s just going to get worse re: this whole AI thing.