I'm constantly thinking about that Microsoft guy who posted something like "we want 1 million LoC per engineer per month", which basically read as satire to most engineers I talked to, except apparently it was not satire at all, and indeed seemed to reflect the position of many CEOs etc when it comes to LLM code generation.
I do think that over the past few months, it feels like the hype around producing unmaintainable amounts of LoC has started dying down. More pragmatic and realistic takes are seemingly shared more openly, and are maybe even getting through to top leadership at some tech companies. Maybe not all is lost yet.
I once worked in a company where there was an 80% code coverage requirement. Some enterprising contractor had a script that generated a single file with its own covering test suite the size of which could be tuned to achieve 80% over the whole codebase. Mostly the code was untested.
And thanks to AI, we could generate extremely convincing reams of code whose only purpose is to be fake unit tested. Amazing. I sincerely hope I never need to use this nuclear weapon.
The word “slop” was a good choice to talk about the mass of code generated by AI. I think it resonates with non-tech people and it conveys disgust. It’s clear that we should avoid slop.
“Technical debt” never hooked management in the same way and we have found it hard to convince them that it needs to be addressed. Debt in general is something that can be a problem, but doesn’t need to be avoided or addressed until it is a problem so the can is kicked down the road.
To be fair, they are also different things, though there is certainly overlap...
To me, tech debt, captures the idea that we cut corners now to move faster, with the understanding that it will need to be "re-paid" and cleaned up later, otherwise we take on too much tech debt, and everyone knows too much debt is bad...
AI slop code means people feed their tasks to a model, trust it to drive the changes, they might do some cosmetic clean ups, then generate a 3 pager PR description they didn't even read themselves, then toss it over to the code reviewer, let that chump figure out what the hell I was doing while I ship 3-4 more PRs...
Technical debt is a indefinable quantity which makes it very prone to be abused to mean "I wish I could rewrite this in [insert some fashionable language, framework or coding style]".
AI slop is an easier concept to quantify. It's basically the code for which insufficient people in the organisation have a meaningful understanding of how it works or what it does.
> It's basically the code for which insufficient people in the organisation have a meaningful understanding of how it works or what it does.
Its connotation also includes being vastly larger than needed for the purpose it serves, _if_ there is even any purpose.
> which basically read as satire to most engineers I talked to
Seemingly engineers get this wrong too. I'm reminded of when Cursor bragged about how many lines of code a group of agents could produce, with the underwhelming results of a barely working browser, when the same could be built with much less code.
But they highlighted the amount of code as they were proud over how much slop their constellation of agents had shit out, and these were supposedly engineers, really strange to see.
“Less is better” is sort of… the position of the engineer who enjoys the craft of programming, right? I don’t think this is universally believed.
And anyway, I’m pretty sure what people really mean by this “less is better” mantra is: the lowest amount of code that still accomplishes the goal and is still readable is preferred. Linux apparently has 40M lines of code, and I bet most of it is better than mine. Some things just take lots of code.
Which seems to leave room for these agent salesmen to pitch SLoC as a plus. We just have to believe those lines are all good ones. I that case, it would be impressive. I don’t believe it, but they are probably pitching to people who do.
> “Less is better” is sort of… the position of the engineer who enjoys the craft of programming, right?
No, it's the perspective of a programmer who wants the project to not be bogged down too much in technical debt so every change gets slower and slower to implement, as everything gets more intermingled. A clean design helps you move faster for a long time, compared to a design that is fast to implement but makes it hard to move forward properly in the future, without resorting to shortcuts and/or hacks.
> Some things just take lots of code.
True. Rich Hickey does a good job differentiating between what's complicated because the domain is complicated, VS what's complicated because the implementation just ended up that way, even though with some more thought and design, could have been made a lot simpler.
> “Less is better” is sort of… the position of the engineer who enjoys the craft of programming, right? I don’t think this is universally believed.
I think it is (or should be) a goal & business-oriented concern as well, not just an engineer's who enjoys their craft.
More complex systems are worse than simpler systems (that accomplish the same), in cost, maintenance, fragility, ease of understanding, etc. Fewer moving parts usually result in higher reliability, fewer things that can break down or fail to interact properly, etc. That's a business concern too, not just engineering craftmanship or whatever. Business people should care about this too.
I don't think this is the same as bikeshedding over irrelevant details, something we software engineers are often prone to. Monstrous complexity does impact the business!
It's like we've all forgotten what technical debt means. We just say the phrase, but we have forgotten that it is analogous to actual debt. Every line of code produced should be treated as a liability to the company, like a bond they issued that they have to pay interest on in the future. You only take on the liability if it produces more business value than it costs to maintain. The goal is not to issue as many bonds as you possibly can.
> I do think that over the past few months, it feels like the hype around producing unmaintainable amounts of LoC has started dying down.
I wonder if a small part of this is more and more business and product people actually trying to incorporate AI into their daily workflows. I have seen this in both small companies I work for. People were very excited about getting Claude Cowork a couple of months ago, and while they use it daily, I would say they are rather underwhelmed compared to the magic they were expecting. Complaints include the output being mediocre and verbose, it getting the most basic things wrong, hitting token limits all the time, and people going back to doing things themselves because it is faster.
Sure, there is some degree of holding it wrong in the beginning, but people are realizing that maybe, just maybe, there is still somewhat of a gap between what AI CEOs, LinkedIn grifters, and YouTube AI supplement peddlers claim and reality.
I suspect this is it. I'm 40, and the only tech person in my social circle. Many of my friends were all excited about using it for things like basic webdev and home networking. One shotting that type of stuff is very viable even if you don't know anything about the topic. Now that they are trying to use it for something they actually know about, suddenly it's unusable. It's a modification of Gell-Mann Amnesia.
I had an MoM at Stripe who pushed back on perf designations based on number of PRs.
I wish I were joking.
(The had never been an engineer.)
It's a signal. It's not a strong signal, and you certainly should not base your entire perf on it, but if the number is unusually high or low, it's a signal that could warrant further investigation.
(I once worked with an engineer that had two PRs, both fairly small bug fixes, in a given calendar year, and when I looked more carefully, they did not have any other obvious output or impact.)
Trying to parse your sentence, which is ambiguous...
You're saying that the manager-of-managers would argue that the number of PRs should affect perf ratings? Or the MoM would push back against the line managers who were giving ratings based on # of PRs?
They were reviewing perf designations, then pulling up PR count, then arguing against designation based on the number of PRs opened.
That still doesn't clarify: were they saying "many PRs→good" or "many PRs→bad" or "number of PRs is irrelevant" or...?
That PRs == impact.
I think the reliability struggles of Github may have helped with this
I can't help but wonder if the causation is backwards here and the millions of lines of slop had more to do with the Github struggles than the reverse
In reality yes, and probably a complex mixture of things. Dedicated time and resources being siphoned off for Llm work, etc
I also think starting to migrate to Azure just as their traffic/usage exploded from LLM use (plus I assume merging a bunch of poorly written early-gen LLM code as early adopter dog fooding) was poor planning by Github/Microsoft.
It's not unmaintainable if you have 1000 agents maintain it.
It is unmaintainable even if you spend 100k per month on tokens to have LLMs pretend they are maintaining it, if they slow down and make little ACTUAL progress. Sadly real progress is impossible to measure, if all you have is an overexcited """engineer""", a credit card, and so much cash spent you could hire all the best engineers you know and still have money for a porsche.
Well, software presumably has a goal of accomplishing something for some end-user, so the progress should be trivial to measure: are features/changes being completed?
The marketing ploys of OpenAI/Anthropic where agents build something that nobody uses might be hard to track given that there are zero users. But what about everyone using agents for real software? It's trivial to prove that agents make progress.
It's not unmaintainable if most of it is tests. Just have it write tests until it becomes safe for AI.
I hate I can't tell if you're joking without checking posting history lol
All else being equal, and assuming you are building the right thing, being able to deliver more correct lines of code is a good thing. The question is how to do it reliably, given that a human cannot possibly read all of it. The answer seems to me to involve spot checks with proofs of correctness and statistical quality control, the latter being things that can be automated. One issue I see is that the models are constantly changing and are therefore not well understood statistically.
>All else being equal, and assuming you are building the right thing, being able to deliver more correct lines of code is a good thing.
Why? If you can deliver the same thing in fewer correct lines of code wouldn't that be preferable? At a bare minimum if you're still insisting on using AI to slop out your project, having it do things in fewer lines of code means you can fit more into your LLM's context window.
> If you can deliver the same thing in fewer correct lines of code
it really depends on what you're doing. If your goal is "become interoperable with the N different and incompatible network protocols that people have devised for doing task X" I'd really like to know a solution that doesn't have at least some part of the amount of code that scales with N.
Example: consider https://bitfocus.io/connections which connects to 700 different things. Right now it's written with Node.JS, with one repo per connection (example: https://github.com/bitfocus/companion-module-meyersound-gala...). Let's say you want to make a similar product but that runs on ESP32 where performance is paramount so you need C++ or Rust. How do you do that without at least as many lines of code as the existing JS implementations for every system supported by Companion?
This is still not an argument for more lines of code. It demonstrates that lines of code are positively correlated with number of features, yes. But that's like saying the number of nails scales with the size of a house. More nails does not create more house.
Without looking at the details, I expect that each network protocol has a checksum of some form, and there are likely a lot less than N different checksum algorithms. Similarly I expect several will have encryption - using one of a few standard algorithms (if any doesn't use a standard algorithm you have a strong case to say not supported). I also expect that there is a lot of protocol parsing - this can be done as custom hand coded for each, or using a parsing framework (and likely there are some places of generic code in between).
Parent said "I'd really like to know a solution that doesn't have at least some part of the amount of code that scales with N."
You're arguing the inverse: that at least some parts of the code are independent of N. Sure. But the topic is the part that isn't.
Then you simply produce those fewer lines of code even faster. The question is, how fast are you delivering correct code?
Moreover, writing too terse code harms readability and maintainability. There is such a thing as irreducible complexity.
> I'm constantly thinking about that Microsoft guy who posted something like "we want 1 million LoC per engineer per month", which basically read as satire to most engineers I talked to
Did those engineers not actually read the complete tweet? Because it wasn't about "engineers should write 1M LOC per month of product code" it was "we want to scale automated porting of code to safe languages so that 1 engineer managing 1M LOC of automated conversion can work". Which doesn't seem like satire at all..? It just means "develop mostly reliable AI-driven refactoring tools with good guard rails". Which seems quite sensible, actually?
I don't care - porting the current architecture - with all the known I wish I had done this differently's - doesn't gain much. See some developers I've worked with who love Rust for "safety", even though they just put everything in unsafe at the first sign of trouble instead of thinking about how this should work safely.
Porting to a new language is easy, but does nothing useful. What we need is to fix the mistakes of the past so we can get to the future. We need to make acceptable performance.
> Because it wasn't about "engineers should write 1M LOC per month of product code" it was "we want to scale automated porting of code to safe languages so that 1 engineer managing 1M LOC of automated conversion can work".
Making a grand claim of a goal and not really having an explanation on how to achieve it isn't really much better. I could say "we want to scale food production so that one farmer could manage a million acres of corn a month", but that wouldn't really be sensible. A line of code is less work than an acre of corn of course, but I don't think it's at all apparent what upper bound for how much code is actually plausible for a single engineer to generate in a month and have any degree of confidence in. Given the absurd levels of hype around AI from non-engineering management in the past couple of years, it's not clear why the benefit of the doubt is earned here when there legitimate are managers and executives claiming pretty much exactly what you're claiming this guy wasn't.
Minor correction: LinkedIn, not twitter. https://www.linkedin.com/posts/galenh_principal-software-eng...
> Because it wasn't about "engineers should write 1M LOC per month of product code" it was "we want to scale automated porting of code to safe languages so that 1 engineer managing 1M LOC of automated conversion can work"
These are one and the same. Whether it's ported code or not doesn't change that. The framing device also doesn't matter, because it's the exact "Oh it's our goal" shtick that executives use in the former's case.
"It's just a measure" doesn't cut it in a world where every single AI measure immediately gets turned into a target by executives greedy for efficiencies that don't exist.
EDIT:
Right, I forgot. This is HN where everyone is a galaxybrain and "Port a million lines of code per month" is a totally reasonable goal for a single individual.
I can easily game writing 1M LOC per month by having the LLM write code in more verbose ways, with useless indirections and abstractions thrown in for good measure. I could even ask claude to write code that does nothing but just takes up line.
In contrast, converting 1M LOC of code per month is a much more solid measure, as long as you measure LOC of the source, not the new code. Sure, in the short term you can pick the easy/verbose things to port, but it's hard to do sustainably. A 5M LOC code base would still be expected to be ported in 5 engineer months.
Granted, you can still rush the work, not test properly, neglect good planning and engineering. Ported lines of code should not be the only measure (just like with any other measure). But it's a much less problematic measure than coding 1M LOC
> Granted, you can still rush the work, not test properly, neglect good planning and engineering.
Which is the core point of my reply and not something to just be casually handwaved, thank you very much.
If everything in the initial code is 300% covered with excellently documented tests that should be minimally changed during transition (if transition don’t reveal any corner case tests were missing, maybe the transition is not such a bright move after all), that seems a possible thing to consider.
Otherwise it really sounds like a recipe for unnecessary huge risk with dubious expected positive outcome.
Not saying don’t have fun, but on the other side maybe not with the core product of you cash cow already?
> "we want to scale automated porting of code to safe languages so that 1 engineer managing 1M LOC of automated conversion can work". Which doesn't seem like satire at all..?
Because many programmers don't believe that'd work. See the reaction to Bun's porting to rust. (I bet Bun will work and prove those programmers wrong, but that's another story.)