Hacker News

> Over the past year, I’ve watched engineers use AI to ship in days what used to take a team weeks.

No, you didn't. You watched engineers use AI to ship in days something that looks like what used to take a team weeks. After enough rounds of feature evolution, you'll realise that what they actually shipped isn't at all the same. Anthropic's C compiler, which also seemed like a good start that would have taken people much longer to deliver, ended up being impossible to turn into something actually workable.

In a year or so, software developed by "AI-native talent who can manage fleets of agents to drive outsized impact" - which is another way of saying people who ship code they don't understand and therefore haven't fixed the architectural mistakes the agents make - will become impossible to evolve, and then things will get very interesting.

AI can help software developers in many ways, but not like that.

sobellian 12 hours ago [ - ]

AI definitely leads to some productivity gain but the claims of 10x, 100x, 1000+x are (for now) irrational exuberance. Churning out prototype software has always been quick, and now it's blazing. But these LLMs are like Happy Gilmore. They get to the green in one shot then they orbit the hole with an extremely dubious short game. The virtue is in their parallelizability but you still need to review their work, lest you come back to it wrestling an alligator while a ruined TV tower husk sends spark showers over the pin.

jedberg 8 hours ago [ - ]

> But these LLMs are like Happy Gilmore. They get to the green in one shot then they orbit the hole with an extremely dubious short game.

Except that he got good at his short game by the end. LLMs will get there sooner than we think.

maccard 4 hours ago [ - ]

I don’t think we will though. Because the “short game” is match the requirements of the agent operator. If we don’t care about the finer details that we let the LLMs infer, then we shouldn’t care if a human infers them (but yet we do).

I think LLMs are great, and I think people who can use them to get to the green in one and take it from there will soar, just like people who could identify a problem and solve it themselves did in the past.

adamtaylor_13 13 hours ago [ - ]

I am an engineer. I hire other engineers. I run a company that ships usable software for small businesses.

We do this every day. I'm sorry to say, we are indeed shipping in days what used to take weeks.

MeetingsBrowser 12 hours ago [ - ]

As a software engineer who also hires other software engineers, I’m curious about the disconnect in our experiences.

I do systems programming. Before AI feature development roughly went like, design, implement, test, review with some back edges and a lot of time spent in test and review.

AI has made the implementation part much faster, at the cost of even more time spent testing and reviewing, though still an improvement overall.

We do not see the weeks to days improvement though. The bottleneck before was testing and reviewing, and they are even bigger bottlenecks now.

What kind of work do you do, and what kind of workflow were you using before and after AI to benefit so much?

satvikpendem 8 hours ago [ - ]

> I do systems programming.

I'll stop you right there. AI is not good at systems programming, it's good at CRUD web development, which is where most people are seeing the gains.

oytis 3 hours ago [ - ]

I think antirez mentioned somewhere he considered it particularly good at systems programming.

Traubenfuchs 5 hours ago [ - ]

>95% of software development is crud.

id 4 hours ago [ - ]

It's really not, though. As soon as systems have to scale, regulatory requirements come in, etc. it becomes more complex.

AI has solved simple CRUD, yes, but CRUD, was easy before.

10 hours ago [ - ]

[deleted]

logicchains 4 hours ago [ - ]

>AI has made the implementation part much faster, at the cost of even more time spent testing and reviewing,

Maybe they're using AI for testing and reviewing more than you are, not just for coding?

MeetingsBrowser 37 minutes ago [ - ]

The "AI implementation" step in my workflow includes separate agents dedicated to testing and reviewing changes. The self feedback loop catches a lot of errors and mistakes, but it rarely produces working code in one go.

In my experience, the generated code handles the happy path, but isn't great about edge cases or writing clean code, even with explicit instruction in the initial prompt.

We usually end up doing multiple iterations with what claude/codex output, pointing out issues, asking for changes, etc.

logicchains 4 hours ago [ - ]

>AI has made the implementation part much faster, at the cost of even more time spent testing and reviewing,

Maybe they're using AI for testing and reviewing more than you are?

skeptic_ai 9 hours ago [ - ]

I never touched kubernetes and in 1 week I have a few nodes running and i understand a lot of it. Not perfect but not bad.

oytis 8 hours ago [ - ]

I have recently learned Kubernetes without AI and one week is more than enough to understand most of it.

newphone733 8 hours ago [ - ]

This is definitely not true. But I doubt GP understand "most" of kubernetes too. They probably have a good working knowledge of the important commonly used features.

weakfish 24 minutes ago [ - ]

…it definitely is true, I spun up a cluster at home to learn it for a new job and felt comfortable with the basics within a few days.

6 hours ago [ - ]

[deleted]

thrawa8387336 8 hours ago [ - ]

That was the usual experience pre AI

kakacik an hour ago [ - ]

Anytime you hear such wild claims, imagine a typical code sweat shop (not just crud apps but templated eshops/business pages etc), not a system that will evolve for another 10-20 years beyond initial implementation and is backend cornerstone of some part of some corporation. That is in the case its actually true, there is tons of PR happening here, plus another gigaton of uncritical fanboyism like with any strong topic.

Now there may be an additional corner case or 20 where its still valid but they are not your typical software engineering work.

I also have your experience, even 100x code delivery improvement would barely move the needle of project delivery in our place. Better, more automated integration and end-to-end functional tests which reflect real world usage/data flows would actually make much bigger difference, no reason to think llms couldn't deliver this in near future.

stavros 12 hours ago [ - ]

Not the OP, but it might be that AI isn't as good at systems programming as it is at other domains, or it might be that you're using it differently than I am. I don't know which one it is (maybe AI just isn't good at writing the language you work with).

For things like web frontents/backends, though, it works beautifully. I ship things in days that would take me weeks to write by hand, and I'm very fast at writing things by hand. The AI also ships many fewer bugs than our average senior programmer, though maybe not fewer bugs than our staff programmers.

rustystump 11 hours ago [ - ]

In my experience ai has had far far more bugs than most of what i call senior engineers but far fewer than juniors.

The boost is for what are glorified crud apps which it 1000x the tedious work. However, the choices it makes along the way quickly blows up without cleaning. Seniors know how to keep their workstation clean or they should.

stavros 5 hours ago [ - ]

It sounds like we have opposite experiences.

pron 11 hours ago [ - ]

The only way you could possibly know that is if you're reviewing the code, which means you're not "managing fleets of agents". If you're not reviewing the code (and you wouldn't be if you're managing fleets of agents), then you have no way to tell what you're shipping.

strogonoff 8 hours ago [ - ]

It’s under-appreciated that a proper review takes at least as long as the actual work: it’s all the same time spent understanding the challenge and coming up with the best solution, minus the time spent typing in your solution (almost never a significant amount), plus the time spent understanding their solution and explaining how to get from theirs to yours.

maccard 4 hours ago [ - ]

Can you link to a changelog that shows the 5-10x feature increases? I keep hearing this, but I don’t see anything I use ever actually shipping like this, or people backing this up with any sort of proof.

globular-toast 8 hours ago [ - ]

Does what you ship involve hundreds of lines of HTML/CSS by any chance? Do you care about accessibility?

aprilthird2021 7 hours ago [ - ]

Give an example.

I have an example in my line of work. Full service rewrite in a new language. Would have taken forever without AI. AI makes it easier, faster. The service has better throughput, uses less machines. Having a complete full test harness that allows us to ensure we are meeting all the functionality of the previous service is key. AND we are keeping the old service on standby because we know we don't know what might be wrong with the new one.

What's your example?

pron an hour ago [ - ]

If you carefully review the code then you're not doing what Armstrong was talking about. If you're not reviewing the code, then you don't really know what it is that the AI built. Of course it passes tests; that's not the problem. The problem is that the code is complicated and obtuse, even if it doesn't seem that way on the surface, and after some rounds of evolution, the agents are no longer able to evolve or maintain the code.

The difference between it's working now and it will continue working in two years is exactly the problem with AI-generated code because the tests can't tell you that, and you don't know which one you have if you don't look really carefully.

willio58 11 hours ago [ - ]

What you are shipping is not the same as what Coinbase is shipping. These are vastly different things. Making a shiny app with AI is great, I'm doing it as I type this. But I am under no delusion that what I make can sustain a multi-million dollar or even billion dollar business in the case of Coinbase. That's plain silly.

mdavid626 7 hours ago [ - ]

Shipping garbage.

coffeefirst 13 hours ago [ - ]

Ever notice how people making this claim never come with receipts?

tokioyoyo 14 hours ago [ - ]

I commented this yesterday, I’ll repeat it again - what do you guys think organizations that have heavily leaned into AI are shipping nowadays?

Most devs aren’t working on cutting edge, low level, mission critical systems. AI is great for that. Every company I personally know have been fast shipping features that are being used daily by millions of people for the past 7 months.

We have the same thing on my team, and we also understand the limitations of AI generated code. If you’re more or less experienced, you can easily see the “good” and “bad” sides of it. So you kinda plan it out in a way that you can “evolve AI generated software”. I wouldn’t say the same thing in 2025 January, but it’s much different times now. Things are already working.

pron 14 hours ago [ - ]

> If you’re more or less experienced, you can easily see the “good” and “bad” sides of it. So you kinda plan it out in a way that you can “evolve AI generated software”.

If you're truly "managing fleets of agents" there's no way you're able to sift through the good and the bad in the output. If your AI-generated code is evolvable (which is hard to tell right now) then you're not writing it with "fleets of agents". If you are writing it with fleets of agents, I would bet it's not evolvable; you just haven't reached the breaking point yet.

tokioyoyo 5 hours ago [ - ]

We’re not managing fleets of agents. They’re not productive for our workflows yet. It’s usually a couple of CC CLIs running and going back and forth on specific tasks we closely control.

pron an hour ago [ - ]

They're not productive for any workflow is my point because they don't produce sustainable software, yet that's exactly what Armstrong is calling for. They don't work, and people experienced with AI workflows already know that.

If you review the code and tell the agent to revert when it gets things wrong (not functionally but architecturally) you're fine. That's not what I was responding to.

snapcaster 40 minutes ago [ - ]

You're just wrong on this though, and I don't know why you aren't realizing it's a skill issue on your part

pron 28 minutes ago [ - ]

Nah, it's a skill issue on the part of those who believe in "agent swarms" (in fact, that's how I recognise AI noobs; they think swarms work). Studies (like this [1]) and Anthropic's experiements have told us they don't. We do experiments with software correctness and formal methods experts who actually dive deep into "swarm outputs" and try to put evolutionary pressure on them. Swarms simply cannot (yet) produce viable software. They do, however, produce software that for a while passes tests. What I think is happening is that people who believe swarms work just look at test results. But obviously, every software engineer has known for decades that tests can only tell you if your software works today; they can't tell you that it will work tomorrow. And the people who say that unreviewed agent output will work tomorrow are those who didn't review it closely enough, so they have no idea, either.

[1]: https://arxiv.org/abs/2603.03823

Zetaphor 14 hours ago [ - ]

Most of the people making this argument vastly overestimate the quality of engineering and discipline that behind the software powering most corporations. CRUD apps are likely to be the most prominent type of application across industries, and most of them are crud

pron 13 hours ago [ - ]

If the code is really simple, it's cheap to read it. When people don't read it (and when they need to use "fleets of agents"), it's because it's not so simple, and then the people who trust the outcome are those who don't know what it is that they've committed into the codebase. Their logic is no more than: the system hasn't collapsed under the load of 50 (or 500) changes so it probably won't collapse under the load of the next 500 (or 5000). Because that's how engineered systems work, right? If they're fine under light stress, they're fine under heavier stress.

amalfra 3 hours ago [ - ]

> Because that's how engineered systems work, right? If they're fine under light stress, they're fine under heavier stress.

Isn't this wrong? I thought engineered systems meant something designed with limits.

pron 3 hours ago [ - ]

I was being sarcastic.

SkyPuncher 11 hours ago [ - ]

Yes, it can. I do this regularly.

I have literally built and shipped multiple things that would have taken me many many months to do and I’ve done it in under a week.

Many of these are LLM heavy features where the LLM can literally self-evaluate and self-optimize. I start with a general feature, it will generate adverse, synthetic data, it will build a feature, optimize it the figure out new places to improve. 1 year ago, this would have taken an entire team months to do, now, it’s 2 or 3 days of work.

audunw 7 hours ago [ - ]

The C compiler was a prime example of an application where the LLM can self-evaluate/optimise, with one of the best set of tests could imagine. Yet the end result was a mess.

I have experienced areas where high productivity can be had without much loss in quality. So I can believe it. But it really depends on what you’re doing and I firmly believe many companies will run out of easy stuff that we can blaze through with AI fairly quickly. At least that’s where we seem to be heading

aprilthird2021 6 hours ago [ - ]

What's an example of such a thing? Just curious

sdevonoes 5 hours ago [ - ]

And your parents must be proud of you. You’re just another cog

smrtinsert 13 hours ago [ - ]

Yeah absolutely embarassing take. If I had a nickle for every time someone sent me some AI garbage that was supposedly "thoroughly vetted and cross checked agent output", I'd be at least a thousandaire (gotta keep it real).

There are strengths, but if you think its writing stream of code and just using it as is, I would LOVE to compete against you.

daemin 10 hours ago [ - ]

People that manage AI agents are not engineers as they do no engineering but are instead just supervisors.

snapcaster 39 minutes ago [ - ]

only dorks care this much about being an "engineer" or "artist". Who gives a shit if misanthropes on websites consider you a real engineer?

runako 24 minutes ago [ - ]

Early in my career, people said this about programmers who (weakly) insisted on using assemblers.

Then, about people using high-level languages like C.

Then, about people using C++.

Then, about people using "toy"/"scripting" languages like PHP and Python.

About people who use ORMs instead of writing SQL directly.

About people who use JavaScript ("not a real programming language" was the dis).

People used to argue how it was the mark of a tourist to use anything more visual than Emacs.

This slight won't stick, nobody cares, and it might end up sounding stupid later. You can't usefully insult a professional engineer in 2026 by pointing out that they haven't memorized ASCII or the Arm instruction set.

terpimost 6 hours ago [ - ]

What is the difference between supervisor and an architect in tech products area?

randallsquared 14 hours ago [ - ]

> In a year or so

Look at the best models from Spring 2025, and compare with now (and similarly for Springs 2024 and 2025). Armstrong and lots of others are betting that this trend will continue, and if it does, the LLMs will ship code the LLMs understand, and whether any human specifically understands any particular part will mostly not matter.

hn_throwaway_99 13 hours ago [ - ]

> the LLMs will ship code the LLMs understand, and whether any human specifically understands any particular part will mostly not matter.

I find this particularly funny. There were more than a couple Star Trek Episodes where some alien planet depends on some advanced AI or other technology that they no longer understand, and it turns out the AI is actually slowly killing them, making them sterile, etc. (e.g. https://en.wikipedia.org/wiki/When_the_Bough_Breaks_(Star_Tr... )

Sure, Star Trek is fiction, but "humans rely on a technology that they forget how to make" is a pretty recurrent theme in human history. The FOGBANK saga was pretty recent: https://en.wikipedia.org/wiki/Fogbank

It just amazes me that people think "Sure, this AI generated code is kinda broken now, but all we need is just more AI code to fix it at some unknowable point in the future because humans won't be able to understand it!"

randallsquared 12 hours ago [ - ]

If you'd told me 20-30 years ago we'd actually get the Star Trek computer in the mid-2020s and it still wouldn't be actually AGI, I would have thought that very strange and unlikely, so who knows?

snapcaster 38 minutes ago [ - ]

So nothing about the last 3 years has caused you to update your beliefs on this stuff? feels like bitter cope

pron 14 hours ago [ - ]

And if the trend doesn't continue? I understand that a company with Coinbase's performance has little to lose and not many options, but many companies are in a better position.

The problem is that executives could take the 15-20% productivity boost and be content, but they read stuff like this, get greedy, and they don't understand the risk they're taking.

atonse 13 hours ago [ - ]

Even if the trend doesn’t continue, the current models are very very good. They’re better than the average programmer in the industry, already.

pron 11 hours ago [ - ]

I don't know how anyone who carefully and closely reviews their output could possibly think that. Much of the time their code is fine, but every now and again they make a catastrophic (though often well-hidden) mistake that is so bad that all the tests pass but the codebase will be bricked if enough of those go in. They make such disastrous mistakes frequently enough that a decent-sized codebase can't last for more than 18-24 months.

If the average programmer is this bad, then there must be better-than-average programmers reviewing the code. The problem with agents is that they can produce code at a far higher volume than the average programmer.

Anyway, I don't know how well the average programmer programs, but if you commit agent-generated code without careful review, your codebase will be cooked in a year or two.

zeroonetwothree 13 hours ago [ - ]

Maybe at some coding benchmark. Certainly not at actually shipping and maintaining production grade software.

randallsquared 14 hours ago [ - ]

Agreed! That will be an... "interesting" outcome, if so, for a lot of these companies.

bix6 14 hours ago [ - ]

> and whether any human specifically understands any particular part will mostly not matter.

This is how I feel. It’s building things for me that work. I don’t care how it works under the hood in many cases.

pron 14 hours ago [ - ]

It's not about caring how it works. It's about caring that it keeps working at all even after you add stuff to it for a year or three (and nearly all software written by companies is software they evolve).

bix6 14 hours ago [ - ]

And who’s to say it won’t? It’s working now. I’m adding stuff and it’s still working. Why won’t that continue in year 3?

pron 13 hours ago [ - ]

If you carefully read the agent's output you'll see why. It adds layers upon layers of workarounds and defences that hide serious problems, until the codebase reaches a point where the agent can no longer understand it and work with it. All the tests pass right up until the moment when adding a feature or fixing a bug causes another bug, and then nothing and no one can save the codebase anymore.

qingcharles 12 hours ago [ - ]

Maybe a year ago? Right now the LLMs I mainly use (GPT5.5, Opus 4.7) will intuit exactly what I need from my brief specs and universally go above-and-beyond in creating code that is not only extremely high-quality, but catches a ton of the gotchas I would have stumbled on, in advance.

Just a minute ago 5.5 looked at some human-written code of mine from last year and while it was making the changes I asked for it determined the existing code was too brittle (it was) and rewrote it better. It didn't mention this in its summary at the end, I only know because I often watch the thinking output as it goes past before it hides it all behind a pop-open.

s__s 12 hours ago [ - ]

Interesting that we’ve have such different experiences. I was working with both those models today and on several occasions it proposed some pretty poor solutions.

I also find I need to run an llm code review or two against any code it produces to even get to the point where’s it’s ready for human review.

In any case they served as an extremely valuable tool.

pron 11 hours ago [ - ]

I use GPT 5.5. Sometimes it does what you say. It certainly finds silly mistakes in my code better than I could. But frequently enough it makes catastrophic architectural mistakes in its own code.

titularcomment 13 hours ago [ - ]

Maintaining software is like 80% of the job.

techblueberry 13 hours ago [ - ]

Because the API’s it uses will change? Nothing in tech is static. And that’s just going to get worse re: this whole AI thing.