The problem I see is not so much in how you generate the code. It is about how to maintain the code. If you check in the AI generated code unchanged then do you start changing that code by hand later? Do you trust that in the future AI can fix bugs in your code. Or do you clean up the AI generated code first?
LLMs remove the familiarity of “I wrote this and deeply understand this”. In other words, everything is “legacy code” now ;-)
For those who are less experienced with the constant surprises that legacy code bases can provide, LLMs are deeply unsettling.
This is the key point for me in all this.
I've never worked in web development, where it seems to me the majority of LLM coding assistants are deployed.
I work on safety critical and life sustaining software and hardware. That's the perspective I have on the world. One question that comes up is "why does it take so long to design and build these systems?" For me, the answer is: that's how long it takes humans to reach a sufficient level of understanding of what they're doing. That's when we ship: when we can provide objective evidence that the systems we've built are safe and effective. These systems we build, which are complex, have to interact with the real world, which is messy and far more complicated.
Writing more code means that's more complexity for humans (note the plurality) to understand. Hiring more people means that's more people who need to understand how the systems work. Want to pull in the schedule? That means humans have to understand in less time. Want to use Agile or this coding tool or that editor or this framework? Fine, these tools might make certain tasks a little easier, but none of that is going to remove the requirement that humans need to understand complex systems before they will work in the real world.
So then we come to LLMs. It's another episode of "finally, we can get these pesky engineers and their time wasting out of the loop". Maybe one day. But we are far from that today. What matters today is still how well do human engineers understand what they're doing. Are you using LLMs to help engineers better understand what they are building? Good. If that's the case you'll probably build more robust systems, and you _might_ even ship faster.
Are you trying to use LLMs to fool yourself into thinking this still isn't the game of humans needing to understand what's going on? "Let's offload some of the understanding of how these systems work onto the AI so we can save time and money". Then I think we're in trouble.
I don't think "understanding" should be the criteria, you can't commit your eyes in the PR. What you can commit is a test that enforces that understanding programatically. And we can do many many more tests now than before. You just need to ensure testing is deep and well designed.
" They make it easier to explore ideas, to set things up, to translate intent into code across many specialized languages. But the real capability—our ability to respond to change—comes not from how fast we can produce code, but from how deeply we understand the system we are shaping. Tools keep getting smarter. The nature of learning loop stays the same."
https://martinfowler.com/articles/llm-learning-loop.html
Learning happens when your ideas break, when code fails, unexpected things happen. And in order to have that in a coding agent you need to provide a sensitive skin, which is made of tests, they provide pain feedback to the agent. Inside a good test harness the agent can't break things, it moves in a safe space with greater efficiency than before. So it was the environment providing us with understanding all alone, and we should make an environment where AI can understand what are the effects of its actions.
> Are you trying to use LLMs to fool yourself into thinking this still isn't the game of humans needing to understand what's going on?
This is a key question. If you look at all the anti-AI stuff around software engineering, the pervading sentiment is “this will never be a senior engineer”. Setting aside the possibility of future models actually bridging this gap (this would be AGI), let’s accept this as true.
You don’t need an LLM to be a senior engineer to be an effective tool, though. If an LLM can turn your design into concrete code more quickly than you could, that gives you more time to reason over the design, the potential side effects, etc. If you use the LLM well, it allows you to give more time to the things the LLM can’t do well.
Why can't you use LLMs with formal methods? Mathematicians are using LLMs to develop complex proofs. How is that any different?
I don't know why you're being downvoted, I think you're right.
I think LLMs need different coding languages, ones that emphasise correctness and formal methods. I think we'll develop specific languages for using LLMs with that work better for this task.
Of course, training an LLM to use it then becomes a chicken/egg problem, but I don't think that's insurmountable.
maybe. I think we're really just starting this, and I suspect that trying to fuse neural networks with symbolic logic is a really interesting direction to try to explore.
that's kind of not what we're talking about. a pretty large fraction of the community thinks programming is stone cold over because we can talk to an LLM and have it spit out some code that eventually compiles.
personally I think there will be a huge shift in the way things are done. it just won't look like Claude.
I suspect that we are going to have a wave of gurus who show up soon to teach us how to code with LLMs. There’s so much doom and gloom in these sorts of threads about the death of quality code that someone is going to make money telling people how to avoid that problem.
The scenario you describe is a legitimate concern if you’re checking in AI generated code with minimal oversight. In fact I’d say it’s inevitable if you don’t maintain strict quality control. But that’s always the case, which is why code review is a thing. Likewise you can use LLMs without just checking in garbage.
The way I’ve used LLMs for coding so far is to give instructions and then iterate on the result (manually or with further instructions) until it meets my quality standards. It’s definitely slower than just checking in the first working thing the LLM churns out, but it’s sill been faster than doing it myself, I understand it exactly as well because I have to in order to give instructions (design) and iterate.
My favorite definition of “legacy code” is “code that is not tested” because no matter who writes code, it turns into a minefield quickly if it doesn’t have tests.
> My favorite definition of “legacy code” is “code that is not tested” because no matter who writes code, it turns into a minefield quickly if it doesn’t have tests.
Unfortunately, "tests" don't do it, they have to be "good tests". I know, because I work on a codebase that has a lot of tests and some modules have good tests and some might as well not have tests because the tests just tell you that you changed something.
How do you know that it's actually faster than if you'd just written it yourself? I think the review and iteration part _is_ the work, and the fact that you started from something generated by an LLM doesn't actually speed things up. The research that I've seen also generally backs this idea up -- LLMs _feel_ very fast because code is being generated quickly, but they haven't actually done any of the work.
Because I’ve been a software engineer for over 20 years. If I look at a feature and feel like it will take me a day and an LLM churns it out in a hour including the iterating, I’m confident that using the LLM was meaningfully faster. Especially since engineers (including me) are notoriously bad at accurate estimation and things usually take at least twice as long as they estimate.
I have tested throwing several features at an LLM lately and I have no doubt that I’m significantly faster when using an LLM. My experience matches what Antirez describes. This doesn’t make me 10x faster, mostly because so much of my job is not coding. But in term of raw coding, I can believe it’s close to 10x.
I'll back this up. I've also been a dev for over 20 years, my agentic workflow sounds the same or similar to yours (I'm using an agentic IDE) and I am finishing tasks significantly faster than before I adapted to using the agent. In fact people have noticed to the point that I have been asked to show team members what I am doing. I don't really understand why it seems like the majority of HN believes this is impossible.
I see where you're coming from, and I agree with the implication that this is more of an issue for inexperienced devs. Having said that, I'd push back a bit on the "legacy" characterization.
For me, if I check in LLM-generated code, it means I've signed off on the final revision and feel comfortable maintaining it to a similar degree as though it were fully hand-written. I may not know every character as intimately as that of code I'd finished writing by hand a day ago, but it shouldn't be any more "legacy" to me than code I wrote by hand a year ago.
It's a bit of a meme that AI code is somehow an incomprehensible black box, but if that is ever the case, it's a failure of the user, not the tool. At the end of the day, a human needs to take responsibility for any code that ends up in a product. You can't just ship something that people will depend on not to harm them without any human ever having had the slightest idea of what it does under the hood.
Take responsibility by leaving a good documentation of your code and a beefy set of tests, future agents and humans will have a point to bootstrap from, not just plain code.
Yes, that too, but you should still review and understand your code.
I think it was Cory Doctorow who compared AI-generated code to asbestos. Back in its day, asbestos was in everything, because of how useful it seemed. Fast forward decades and now asbestos abatement is a hugely expensive and time-consuming requirement for any remodeling or teardown project. Lead paint has some of the same history.
Get your domain names now! AI Slop Abatement, the major growth industry of the 2030s.
You don't just code with AI, you provide 2 things
1. a detailed spec, result of your discussions with the agent about work, when it gets it you ask the agent to formalize it into docs
2. an extensive suite of tests to cover every angle; the tests are generated, but your have to ensure their quality, coverage and depth
I think, to make a metaphor, that specs are like the skeleton of the agent, tests are like the skin, while the agent itself is the muscle and cerebellum, and you are the PFC. Skeleton provides structure and decides how the joints fit, tests provide pain and feedback. The muscle is made more efficient between the two.
In short the new coding loop looks like: "spec -> code -> test, rinse and repeat"
This is exactly my workflow. I use an genetic IDE and:
- discuss and investigate the feature with the agent. In depth. - formalize it into a plan - tell it to follow the plan - babysit it and review everything
This is actually more enjoyable than it sounds, to me at least. I can often work on more than 1 task at a time and I am freed from doing the coding and take more of an architect-type role, especially when creating something new.
I absolutely do not just tell Claude Code to go do something and then come back and check it in to git. I am actually not yet sure if these terminal-based long-running agents can be used for much other than vibe coding. Use an IDE and watch it closely if you want to make production quality code.
Is it really much different from maintaining code that other people wrote and that you merged?
Yes, this is (partly) why developer salaries are so high. I can trust my coworkers in ways not possible with AI.
There is no process solution for low performers (as of today).
The solution for low performers is very close oversight. If you imagine an LLM as a very junior engineer who needs an inordinate amount of hand holding (but who can also read and write about 1000x faster than you and who gets paid approximately nothing), you can get a lot of useful work out of it.
A lot of the criticisms of AI coding seem to come from people who think that the only way to use AI is to treat it as a peer. “Code this up and commit to main” is probably a workable model for throwaway projects. It’s not workable for long term projects, at least not currently.
A Junior programmer is a total waste of time if they don't learn. I don't help Juniors because it is an effective use of my time, but because there is hope that they'll learn and become Seniors. It is a long term investment. LLMs are not.
It’s a metaphor. With enough oversight, a qualified engineer can get good results out of an underperforming (or extremely junior) engineer. With a junior engineer, you give the oversight to help them grow. With an underperforming engineer you hope they grow quickly or you eventually terminate their employment because it’s a poor time trade off.
The trade off with an LLM is different. It’s not actually a junior or underperforming engineer. It’s far faster at churning out code than even the best engineers. It can read code far faster. It writes tests more consistently than most engineers (in my experience). It is surprisingly good at catching edge cases. With a junior engineer, you drag down your own performance to improve theirs and you’re often trading off short term benefits vs long term. With an LLM, your net performance goes up because it’s augmenting you with its own strengths.
As an engineer, it will never reach senior level (though future models might). But as a tool, it can enable you to do more.
> It’s far faster at churning out code than even the best engineers.
I'm not sure I can think of a more damning indictment than this tbh
Can you explain why that’s damning?
I guess everyone dealing with legacy software sees code as a cost factor. Being able to delete code is harder, but often more important than writing code.
Owning code requires you to maintain it. Finding out what parts of the code actual implement features and what parts are not needed anymore (or were never needed in the first place) is really hard. Since most of the time the requirements have never been documented and the authors have left or cannot remember. But not understanding what the code does removed all possibility to improve or modify it. This is how software dies.
Churning out code fast is a huge future liability. Management wants solutions fast and doesn't understand these long term costs. It is the same with all code generators: Short term gains, but long term maintainability issues.
Do you not write code? Is your code base frozen, or do you write code for new features and bug fixes?
The fact that AI can churn out code 1000x faster does not mean you should have it churn out 1000x more code. You might have a list of 20 critical features and it have time to implement 10. AI could let you get all 20 but shouldn’t mean you check in code for 1000 features you don’t even need.
Sure if you just leave all the code there. But if it's churning out iterations, incrementally improving stuff, it seems ok? That's pretty much what we do as humans, at least IME.
Sure:
[1] https://saintgimp.org/2009/03/11/source-code-is-a-liability-...
[2] https://pluralistic.net/2026/01/06/1000x-liability/
I feel like this is a forest for the trees kind of thing.
It is implied that the code being created is for “capabilities”. If your AI is churning out needless code, then sure, that’s a bad thing. Why would you be asking the AI for code you don’t need, though? You should be asking it for critical features, bug fixes, the things you would be coding up regardless.
You can use a hammer to break your own toes or you can use it to put a roof on your house. Using a tool poorly reflects on the craftsman, not the tool.
> It writes tests more consistently than most engineers (in my experience)
I'm going to nit on this specifically. I firmly believe anyone that genuinely believes this either never writes tests that actually matter, or doesn't review the tests that an LLM throws out there. I've seen so many cases of people saying 'look at all these valid tests our LLM of choice wrote' only for half of them to do nothing and half of them misleading as to what it actually tests.
This has been my experience as well. So far, whenever I’ve been initially satisfied with the one shotted tests, when I had to go back to them I realized they needed to be reworked.
It’s like anything else, you’ve got to check the results and potentially push it to fix stuff.
I recently had AI code up a feature that was essentially text manipulation. There were existing tests to show it how to write effective tests and it did a great job of covering the new functionality. My feedback to the AI was mostly around some inaccurate comments it made in the code but the coverage was solid. Would have actually been faster for me to fix but I’m experimenting with how much I can make the AI do.
On the other hand I had AI code up another feature in a different code base and it produced a bunch of tests with little actual validation. It basically invoked the new functionality with a good spectrum of arguments but then just validated that the code didn’t throw. And in one case it tested something that diverged slightly from how the code would actually be invoked. In that case I told it how to validate what the functionality was actually doing and how to make the one test more representative. In the end it was good coverage with a small amount of work.
For people who don’t usually test or care bunch about testing, yeah, they probably let the AI create garbage tests.
>feature that was essentially text manipulation
That seems like the kind of feature where the LLM would already have the domain knowledge needed to write reasonable tests, though. Similar to how it can vibe code a surprisingly complicated website or video game without much help, but probably not create a single component of a complex distributed system that will fit into an existing architecture, with exactly the correct behaviour based on some obscure domain knowledge that pretty much exists only in your company.
> probably not create a single component of a complex distributed system that will fit into an existing architecture, with exactly the correct behaviour based on some obscure domain knowledge that pretty much exists only in your company.
An LLM is not a principal engineer. It is a tool. If you try to use it to autonomously create complex systems, you are going to have a bad time. All of the respectable people hyping AI for coding are pretty clear that they have to direct it to get good results in custom domains or complex projects.
A principal engineer would also fail if you asked them to develop a component for your proprietary system with no information, but a principal engineer would be able to so their own deep discovery and design if they have the time and resources to do so. An AI needs you to do some of that.
I don't see anything here that corroborates your claim that it outputs more consistent test code than most engineers. In fact your second case would indicate otherwise.
And this also goes back to my first point about writing tests that matters. Coverage can matter, but coverage is not codifying business logic in your test suite. I've seen many engineers focus only on coverage only for their code to blow up in production because they didn't bother to test the actual real world scenarios it would be used in, which requires deep understanding of the full system.
I still feel like in most of these discussions the criticism of LLMs is that they are poor replacements for great engineers. Yeah. They are. LLMs are great tools for great engineers. They won’t replace good engineers and they won’t make shitty engineers good.
You can’t ask an LLM to autonomously write complex test suites. You have to guide it. But when AI creates a solid test suite with 20 minutes of prodding instead of 4 hours of hand coding, that’s a win. It doesn’t need to do everything alone to be useful.
> writing tests that matters
Yeah. So make sure it writes them. My experience so far is that it writes a decent set of tests with little prompting, honestly exceeding what I see a lot of engineers put together (lots of engineers suck at writing tests). With additional prompting it can make them great.
I also find it hard to agree with that part. Perhaps it depends on what type of software you write, but in my experience finding good test cases is one of those things that often requires a deep level of domain knowledge. I haven’t had much luck making LLMs write interesting, non-trivial tests.
Just like LLMs are a total waste of time if you never update the system/developer prompts with additional information as you learn what's important to communicate vs not.
That is a completely different level. I expect a Junior Developer to be able to completely replace me long term and to be able decide when existing rules are outdated and when they should be replaced. Challenge my decisions without me asking for it. Being able to adapt what they have learned to new types of projects or new programming languages. Being Senior is setting the rules.
An LLM only follows rules/prompts. They can never become Senior.
Yes. Firstly AI forgets why it wrote certain code and with humans at least you can ask them when reviewing. Secondly current gen AI(at least Claude) kind of wants to finish the thing instead of thinking of bigger picture. Human programmers code little differently that they hate a single line fix in random file to fix something else in different part of the code.
I think the second is part of RL training to optimize for self contained task like swe bench.
So you live in a world where code history must only be maintained orally? Have you ever thought to ask AI to write documentation on what and why and not just write the code. Asking it to document as well as code works well when the AI needs to go back and change either.
I don't see how asking AI to write some description of why it wrote this or that code would actually result in an explanation of why it wrote that code? It's not like it's thinking about it in that way, it's just generating both things. I guess they'd be in the same context so it might be somewhat correct.
If you ask it to document why it did something, then when it goes back later to update the code it has the why in its context. Otherwise, the AI just sees some code later and has no idea why it was written or what it does without reverse engineering it at the moment.
I'm not sure you understood the GP comment. LLMs don't know and can't tell you why they write certain things. You can't fix that by editing your prompt so it writes it on a comment instead of telling you. It will not put the "why" in the comment, and therefore the "why" won't be in the future LLM's context, because there is no way to make it output the "why".
It can output something that looks like the "why" and that's probably good enough in a large percentage of cases.
LLMs know why they are writing things in the moment, and they can justify decisions. Asking it to write those things down when it writes code works, or even asking them to design the code first and then generate/update code from the design also works. But yes, if things aren’t written down, “the LLM don’t know and can’t tell.” Don’t do that.
I'm going to second seanmcdirmid here, a quick trick is to have Claude write a "remaining.md" if you know you have to do something that will end the session.
Example from this morning, I have to recreate the EFI disk of one of my dev vm's, it means killing the session and rebooting the vm. I had Claude write itself a remaining.md to complement the overall build_guide.vm I'm using so I can pick up where I left off. It's surprisingly effective.
No, humans probably have tens of millions of token in memory of memory per PR. It includes not only what's in the code, but what all they searched, what all they tested and in which way, which order they worked on, the edge cases they faced etc. Claude just can't document all these, else it will run out of its working context pretty soon.
Ya, LLMs are not human level, they have smaller focus windows, but you can "remember" things with documentation, just like humans usually resort to when you realize that their tens of millions of token in memory per PR isn't reliable either.
The nice thing about LLMs, however, is that they don't grumble about writing extra documentation and tests like humans do. You just tell them to write lots of docs and they do it, they don't just do the fun coding part. I can empathize why human programmers feel threatened.
> It can output something that looks like the "why"
This feels like a distinction without difference. This is an extension of the common refrain that LLMs cannot “think”.
Rather than get overly philosophical, I would ask what the difference is in practical terms. If an LLM can write out a “why” and it is sufficient explanation for a human or a future LLM, how is that not a “why“?
Have you tried it? LLMs are quite good at summarizing. Not perfect, but then neither are humans.
> So you live in a world where code history must only be maintained orally?
There are many companies and scenarios where this is completely legitimate.
For example, a startup that's iterating quickly with a small, skilled dev team. A bunch of documentation is a liability, it'll be stale before anyone ever reads it.
Just grabbing someone and collaborating with them on what they wrote is much more effective in that situation.
> For example, a startup that's iterating quickly with a small, skilled dev team. A bunch of documentation is a liability, it'll be stale before anyone ever reads it.
This is a huge advantage for AI though, they don't complain about writing docs, and will actively keep the docs in sync if you pipeline your requests to do something like "I want to change the code to do X, update the design docs, and then update the code". Human beings would just grumble a lot, an AI doesn't complain...it just does the work.
> Just grabbing someone and collaborating with them on what they wrote is much more effective in that situation.
Again, it just sounds to me that you are arguing why AIs are superior, not in how they are inferior.
Have you never had a situation where a question arose a year (or several) later that wasn’t addressed in the original documentation?
In particular IME the LLM generates a lot of documentation that explains what and not a lot of the why (or at least if it does it’s not reflecting underlying business decisions that prompted the change).
You can ask it to generate the why, even if it the agent isn’t doing that by default. At least you can ask it to encode how it is mapping your request to code, and to make sure that the original request is documented, so you can record why it did something at least, even if it can’t have insight into why you made the request in the first place. The same applies to successive changes.
Depends on what you do. When I'm using LLMs to generate code for projects I need to maintain (basically, everything non-throw-away-once-used), I treat it as any other code I'd write, tightly controlled with a focus on simplicity and well-thought out abstractions, and automated testing that verify what needs to be working. Nothing gets "merged" into the code without extensive review, and me understanding the full scope of the change.
So with that, I can change the code by hand afterwards or continue with LLMs, it makes no difference, because it's essentially the same process as if I had someone follow the ideas I describe, and then later they come back with a PR. I think probably this comes naturally to senior programmers and those who had a taste of management and similar positions, but if you haven't reviewed other's code before, I'm not sure how well this process can actually work.
At least for me, I manage to produce code I can maintain, and seemingly others to, and they don't devolve into hairballs/spaghetti. But again, requires reviewing absolutely every line and constantly edit/improve.
We recently got a PR from somebody adding a new feature and the person said he doesn't know $LANG but used AI.
The problem is, that code would require a massive amount of cleanup. I took a brief look and some code was in the wrong place. There were coding style issues, etc.
In my experience, the easy part is getting something that works for 99%. The hard part is getting the architecture right, all of the interfaces and making sure there are no corner cases that get the wrong results.
I'm sure AI can easily get to the 99%, but does it help with the rest?
> I'm sure AI can easily get to the 99%, but does it help with the rest?
Yes the AI can help with 100% is it. But the operator of the AI needs to be able to articulate this to the AI .
I've been in this position, where I had no choice but to use AI to write code to fix bugs in another party's codebase, then PR the changes back to the codebase owners. In this case it was vendor software that we rely on which the vendor hadn't fixed critical bugs in yet. And exactly as you described, my PR ultimately got rejected because even though it fixed the bugs in the immediate sense, it presented other issues due to not integrating with the external frameworks the vendor used for their dev processes. At which point it was just easier for the vendor to fix the software their way instead of accept my PR. But the point is that I could have made the PR correct in the first place, if I as the AI operator had the knowledge needed to articulate these more detailed and nuanced requirements to the AI. Since I didn't have this information then the AI generated code that worked but didn't meet the vendors spec. This type of situation is incredibly easy to fall into and is a good example of why you still need a human at the wheel on projects to set the guidance but you don't necessarily need the human to be writing every line of code.
I don't like the situation much but this is the reality of it. We're basically just code reviewers for AI now
I think we will find out that certain languages, frameworks and libraries are easier for AI to get all the way correct. We may even have to design new languages, frameworks and libraries to realize the full promise of AI. But as the ecosystem around AI evolves I think these issues will be solved.
Yeah, so what I'm mostly doing, and advocate for others to do, is basically the pure opposite of that.
Focus on architecture, interfaces, corner-cases, edge-cases and tradeoffs first, and then the details within that won't matter so much anymore. The design/architecture is the hard part, so focus on that first and foremost, and review + throw away bad ideas mercilessly.
Yes it does... but only in the hands of an expert who knows what they are doing.
I'd treat PRs like that as proof of concepts that the thing that can be done, but I'd be surprised if they often produced code that should be directly landed.
In the hands of an expert… right. So is it not incredibly irresponsible to release these tools into the wild, and expose it those who are not experts? They will actually become incredibly worse off. Ironically this does not ‘democratise’ intelligence at all - the gap widens between experts and the rest.
I sometimes wonder what would have happened if OpenAI had built GPT3 and then GPT-4 and NOT released them to the world, on the basis that they were too dangerous for regular people to use.
That nearly happened - it's why OpenAI didn't release open weight models past GPT2, and it's why Google didn't release anything useful built on Transformers despite having invented the architecture.
If we lived in the world today, LLMs would be available only to a small, elite and impossibly well funded class of people. Google and OpenAI would solely get to decide who could explore this new world with them.
I think that would suck.
So… what?
With all due respect I don’t care about an acceleration in writing code - I’m more interested in incremental positive economic impact. To date I haven’t seen anything convince me that this technology will yield this.
Producing more code doesn’t overcome the lack of imagination, creativity and so on to figure out what projects resources should be invested in. This has always been an issue that will compound at firms like Google who have an expansive graveyard of projects laid to rest.
In fact, in a perverse way, all this ‘intelligence’ can exist. At the same time humans can get worse in their ability to make judgments in investment decisions.
So broadly where is the net benefit here?
You mean the net benefit in widespread access to LLMs?
I get the impression there's no answer here that would satisfy you, but personally I'm excited about regular people being able to automate tedious things in their lives without having to spend 6+ months learning to program first.
And being able to enrich their lives with access to as much world knowledge as possible via a system that can translate that knowledge into whatever language and terminology makes the most sense to them.
“I'm excited about regular people being able to automate tedious things in their lives without having to spend 6+ months learning to program first.”
Bring the implicit and explicit costs to date into your analysis and you should quickly realise none of this makes sense from a societal standpoint.
Also you seem to be living in a bubble - the average person doesn’t care about automating anything!
The average person already automates a lot of things in their day to day lives. They spend far less time doing the dishes, laundry, and cleaning because parts of those tasks have been mechanized and automated. I think LLMs probably automate the wrong thing for the average person (i.e., I still have to load the laundry machine and fold the laundry after) but automation has saved the average person a lot of time
For example, my friend doesn’t know programming but his job involves some tedious spreadsheet operations. He was able to use an LLM to generate a Python script to automate part of this work. Saving about 30 min/day. He didn’t review the code at all, but he did review the output to the spreadsheet and that’s all that matters.
His workplace has no one with programming skills, this is automation that would never have happened. Of course it’s not exactly replacing a human or anything. I suppose he could have hired someone to write the script but he never really thought to do that.
What sorts of things will the average, non-technical person think of automating on a computer that are actually quality-of-life-improving?
My favorite anecdotal story here is that a couple of years ago I was attending a training session at a fire station and the fire chief happened to mention that he had spent the past two days manually migrating contact details from one CRM to another.
I do not want the chief of a fire station losing two days of work to something that could be scripted!
I don't want my doctor to vibe script some conversion only to realize weeks or months later it made a subtle error in my prescription. I want both of them to have enough fund to hire someone to do it properly. But wanting is not enough unfortunately...
> Also you seem to be living in a bubble - the average person doesn’t care about automating anything!
One of my life goals is to help bring as many people into my "technology can automate things for you" bubble as I possibly can.
I'm curious about the economic aspects of this. If only experts can use such tools effectively, how big will the total market be and does that warrant the investments?
For companies, if these tools make experts even more special, then experts may get more power certainly when it comes to salary.
So the productively benefits of AI have to be pretty high to overcome this. Does AI make an expert twice as productive?
I have been thinking about this in the last few weeks. First time I see someone commenting about it here.
- If the number of programmers will be drastically reduced, how big of a price increase companies like Anthropic would need to be profitable?
- If you are a manager, you now have a much higher bus factor to deal with. One person leaving means a greater blow on the team's knowledge.
- If the number of programmers will be drastically reduced, the need for managers and middle managers will also decline, no? Hmm...
You can apply the same logic to all technologies, including programming languages, HTTP, cryptography, cameras, etc. Who should decide what's a responsible use?
> We recently got a PR from somebody adding a new feature and the person said he doesn't know $LANG but used AI.
"Oh, and check it out: I'm a bloody genius now! Estás usando este software de traducción in forma incorrecta. Por favor, consulta el manual. I don't even know what I just said, but I can find out!"
... And with this level of quality control, is it still faster than writing it yourself?
Are you just generating code with the LLM? Ya, you are screwed. Are you generating documentation and tests and everything else to help to code live? Your options for maintenance go up. Now just replace “generate” with “maintain” and you are basically asking AI to make changes to a description at the top that then percolate to multiple artifacts being updated, only one happening to be the code itself, and the code updates multiple time as the AI checks tests and stuff.
I wish there were good guides on how to get the best out of LLMs. All of these tips about adding documentation etc seem very useful but I’ve never seen good guides on how to do this effectively or sustainably.
It is still the early days; everyone has their process, and a lot of the process is still ad hoc. It is an exciting time to be in the field though, before turn key solutions come we all get to be explorers.
Would it not be a new paradigm, where the generated code from AI is segregated and treated like a binary blob? You don't change it (beyond perhaps some cosmetic, or superficial changes that the AI missed). You keep the prompt(s), and maintain that instead. And for new changes you want added, the prompts are either modified, or appended to.
Sounds like a nondeterministic nightmare
indeed - https://www.dbreunig.com/2026/01/08/a-software-library-with-... appears to be exactly that - the idea that the only leverage you have for fixing bugs is updating prompts (and, to be fair, test cases, which you should be doing for every bug anyway) is kind of upsetting as someone who thinks software can actually work :-)
(via simonw, didn't see it already on HN)
There is a related issue of ownership. When human programmers make errors that cost revenue or worse, there is (in theory) a clear chain of accountability. Who do you blame if errors generated by LLMs end up in mission critical software?
> Who do you blame if errors generated by LLMs end up in mission critical software?
I don't think many companies/codebases allow LLMs to autonomously edit code and deploy it, there is still a human in the loop that "prompt > generates > reviews > commits", so it really isn't hard to find someone to blame for those errors, if you happen to work in that kind of blame-filled environment.
Same goes with contractors I suppose, if you end up outsourcing work to a contractor, they do a shitty job but that got shipped anyways, who do you blame? Replace "contractor" with "LLM" and I think the answer remains the same.
I have AI agents write, perform code review, improve and iterate upon the code. I trust that an agent with capabilities to write working code can also improve it. I use Claude skills for this and keep improving the skills based on both AI and human code reviews for the same type of code.