It’s a metaphor. With enough oversight, a qualified engineer can get good results out of an underperforming (or extremely junior) engineer. With a junior engineer, you give the oversight to help them grow. With an underperforming engineer you hope they grow quickly or you eventually terminate their employment because it’s a poor time trade off.
The trade off with an LLM is different. It’s not actually a junior or underperforming engineer. It’s far faster at churning out code than even the best engineers. It can read code far faster. It writes tests more consistently than most engineers (in my experience). It is surprisingly good at catching edge cases. With a junior engineer, you drag down your own performance to improve theirs and you’re often trading off short term benefits vs long term. With an LLM, your net performance goes up because it’s augmenting you with its own strengths.
As an engineer, it will never reach senior level (though future models might). But as a tool, it can enable you to do more.
> It’s far faster at churning out code than even the best engineers.
I'm not sure I can think of a more damning indictment than this tbh
Can you explain why that’s damning?
I guess everyone dealing with legacy software sees code as a cost factor. Being able to delete code is harder, but often more important than writing code.
Owning code requires you to maintain it. Finding out what parts of the code actual implement features and what parts are not needed anymore (or were never needed in the first place) is really hard. Since most of the time the requirements have never been documented and the authors have left or cannot remember. But not understanding what the code does removed all possibility to improve or modify it. This is how software dies.
Churning out code fast is a huge future liability. Management wants solutions fast and doesn't understand these long term costs. It is the same with all code generators: Short term gains, but long term maintainability issues.
Do you not write code? Is your code base frozen, or do you write code for new features and bug fixes?
The fact that AI can churn out code 1000x faster does not mean you should have it churn out 1000x more code. You might have a list of 20 critical features and it have time to implement 10. AI could let you get all 20 but shouldn’t mean you check in code for 1000 features you don’t even need.
Sure if you just leave all the code there. But if it's churning out iterations, incrementally improving stuff, it seems ok? That's pretty much what we do as humans, at least IME.
Sure:
[1] https://saintgimp.org/2009/03/11/source-code-is-a-liability-...
[2] https://pluralistic.net/2026/01/06/1000x-liability/
I feel like this is a forest for the trees kind of thing.
It is implied that the code being created is for “capabilities”. If your AI is churning out needless code, then sure, that’s a bad thing. Why would you be asking the AI for code you don’t need, though? You should be asking it for critical features, bug fixes, the things you would be coding up regardless.
You can use a hammer to break your own toes or you can use it to put a roof on your house. Using a tool poorly reflects on the craftsman, not the tool.
> It writes tests more consistently than most engineers (in my experience)
I'm going to nit on this specifically. I firmly believe anyone that genuinely believes this either never writes tests that actually matter, or doesn't review the tests that an LLM throws out there. I've seen so many cases of people saying 'look at all these valid tests our LLM of choice wrote' only for half of them to do nothing and half of them misleading as to what it actually tests.
This has been my experience as well. So far, whenever I’ve been initially satisfied with the one shotted tests, when I had to go back to them I realized they needed to be reworked.
It’s like anything else, you’ve got to check the results and potentially push it to fix stuff.
I recently had AI code up a feature that was essentially text manipulation. There were existing tests to show it how to write effective tests and it did a great job of covering the new functionality. My feedback to the AI was mostly around some inaccurate comments it made in the code but the coverage was solid. Would have actually been faster for me to fix but I’m experimenting with how much I can make the AI do.
On the other hand I had AI code up another feature in a different code base and it produced a bunch of tests with little actual validation. It basically invoked the new functionality with a good spectrum of arguments but then just validated that the code didn’t throw. And in one case it tested something that diverged slightly from how the code would actually be invoked. In that case I told it how to validate what the functionality was actually doing and how to make the one test more representative. In the end it was good coverage with a small amount of work.
For people who don’t usually test or care bunch about testing, yeah, they probably let the AI create garbage tests.
>feature that was essentially text manipulation
That seems like the kind of feature where the LLM would already have the domain knowledge needed to write reasonable tests, though. Similar to how it can vibe code a surprisingly complicated website or video game without much help, but probably not create a single component of a complex distributed system that will fit into an existing architecture, with exactly the correct behaviour based on some obscure domain knowledge that pretty much exists only in your company.
> probably not create a single component of a complex distributed system that will fit into an existing architecture, with exactly the correct behaviour based on some obscure domain knowledge that pretty much exists only in your company.
An LLM is not a principal engineer. It is a tool. If you try to use it to autonomously create complex systems, you are going to have a bad time. All of the respectable people hyping AI for coding are pretty clear that they have to direct it to get good results in custom domains or complex projects.
A principal engineer would also fail if you asked them to develop a component for your proprietary system with no information, but a principal engineer would be able to so their own deep discovery and design if they have the time and resources to do so. An AI needs you to do some of that.
I don't see anything here that corroborates your claim that it outputs more consistent test code than most engineers. In fact your second case would indicate otherwise.
And this also goes back to my first point about writing tests that matters. Coverage can matter, but coverage is not codifying business logic in your test suite. I've seen many engineers focus only on coverage only for their code to blow up in production because they didn't bother to test the actual real world scenarios it would be used in, which requires deep understanding of the full system.
I still feel like in most of these discussions the criticism of LLMs is that they are poor replacements for great engineers. Yeah. They are. LLMs are great tools for great engineers. They won’t replace good engineers and they won’t make shitty engineers good.
You can’t ask an LLM to autonomously write complex test suites. You have to guide it. But when AI creates a solid test suite with 20 minutes of prodding instead of 4 hours of hand coding, that’s a win. It doesn’t need to do everything alone to be useful.
> writing tests that matters
Yeah. So make sure it writes them. My experience so far is that it writes a decent set of tests with little prompting, honestly exceeding what I see a lot of engineers put together (lots of engineers suck at writing tests). With additional prompting it can make them great.
I also find it hard to agree with that part. Perhaps it depends on what type of software you write, but in my experience finding good test cases is one of those things that often requires a deep level of domain knowledge. I haven’t had much luck making LLMs write interesting, non-trivial tests.