Experienced senior developers can spot and fix the slop instantly, while still getting a 30x productivity gain, while entry level or junior devs basically only have one "Filter" by which they determine code quality which is: "Does the code seem to work?".

Unfortunately "Slop" will appear to work enough of the time to fool a Junior.

Also the reason Junior devs get "slop" is because their prompts are "slop". They don't know all the right terminologies for things, nor do they even have the writing/language skills necessary for good prompting.

EDIT: Due to everyone checking my math I corrected this to 30x, as what's provable, from past experience.

30x productivity gain? gtfo of here.

Most things I try to use it for, it has so many problems with its output that at most I get a 50% productivity gain after fixing everything.

I'm already super efficient at editing text with neovim so honestly for some tasks I end up with a productivity loss.

I can easily get a month of work done in a single day yes. So probably the 30x is about the current max, and 50x was hyperbole, because I didn't add it up before doing that post.

I just don't believe this. It's weird; I just don't know where folks are getting these extreme productivity gains from.

For example, the other day I asked a major LLMs to generate a simple markdown viewer with automatic section indentation for me in Node.js. The basic code worked after a few additional prompts from me.

Now I wanted folding. That was also done by the LLM. And then when I tried to add a few additional simples features, things fell apart. There were one or two seemingly simple runtime errors that the LLM was unable to fix after almost 10 tries.

I could fix it if I started digging inside the code, but then the productivity gains would start to slip away.

I'll spend like 10 minutes crafting a prompt that explains a new feature to be added to my app. I explain it in enough detail, with zero ambiguity, such that any human [senior] developer could do it. Often the result is 100s of lines of code generated, and well over 95% of the time the code "Claude 4" generates is exactly what I wanted.

I'm using VSCode Github Copilot in "Agent Mode", btw. It's able to navigate around an entire project, understand it, and work on it. You just lean back and watch it open files, edit them, show you in realtime what it's thought process is, as it does everything, etc. etc. It's truly like magic.

Any other way of doing development, in 2025, is like being in the stone ages.

Your response does not address the example I gave. Sure, if what you are doing is a variation on something that's been done to death, then an LLM is faster at cutting and gluing boilerplate together across multiple files.

Anything beyond that and LLMs require a lot of hand holding, and frequently regress to boot

I can't tell you how many times I've seen people write shoddy ambiguous prompts and then blame the LLM for not being able to read their minds.

If you write a prompt with perfect specificity as to what you want done, an agent like "Github Copilot+Claude" can work at about the same level as a senior dev. I do it all day long. It writes complex SQL, complex algorithms, etc.

Saying it only does boilerplate well reminds me of my mother who was brainwashed by a PBS TV show into thinking LLMs can only finish sentences they've seen before and cannot reason thru things.

You're still talking past my points. Look at the example I gave. Does it seem like the problem was due to an ambiguous prompt?

Even if my prompt was ambiguous, the LLM has no excuse producing code that does not type-check, or crashes in an obvious way when run. The ambiguity should affect what the code tries to do, not it's basic quality.

And your use of totalizing adjectives like "zero ambiguity" and "perfect specificity" tells me your arguments are somewhat suspect. There's nothing like "zero" and "perfect" as far as architecturing and implementing code goes.

When it comes to zero ambiguity and perfect specificity here's how I define it: If I gave the same exact prompt wording to a human would there be any questions they'd need to ask me before starting the work? If they need to ask a clarifying question before starting then I wasn't clear, otherwise I was clear. If you want to balk at phrases like "perfectly clear" you're just nit picking at semantics.

I've worked with some pretty smart people in my career and I've never met anyone who could do "instant" code review.

Actual code review is very slow. More often than not, you just looking for glaring mistakes, not that the code actually respect the specifications. Which results in the LGTM comment. Because you trust the other person's experience. In very critical system, change is very slow to get in.

The "instant" to which you refer was meaning that I can tell instantly if the LLM generated what I wanted or not.

That doesn't mean it's reviewed, it means I'm accepting it to _BE_ what I go with and ultimately review.

Ruling out or refining an approach on the grounds it’s unlikely to lead to a suitable outcome (fixing and removing slop) is not the same as saying this code or approach represents a good enough outcome given what we currently know about the constraints of the problem (code review)

Whenever I can't just sit down and bash out code, it's because the design is wrong. These models are bad at design. I don't see where your 30×–50× could possibly come from.

Most of the times, the only reason I have the code open is to read it. If not for the huge amount of code, I could just print it out and go on my sofa.

If I'm dealing with a difficult to implement algorithm, a whiteboard is a better help than bashing out code.

That 30x math simply comes from spending 5min typing a prompt, and getting code generated that would take a human 2.5hrs to write. This means in the future most of a developer's time will be spent reviewing code, rather than typing it. Because AI will also be able to write the test cases too, so that effort [mostly] vanishes as well.

Unless your job is producing disposable software (e.g. single-use mobile games for short marketing campaigns), this comment suggests you don't know how to do your job. If a piece of the program takes 5 minutes to describe, but 2½ hours to write, you're spending your time in the wrong place, producing code that's legacy almost on day 1. Quoth https://quoteinvestigator.com/2014/03/29/sharp-axe/:

> The text presents to the wood cutter the alternative either to spend time in sharpening his axe, or expend his strength in using a dull one. Which shall he do? Wisdom is profitable to direct.

Sure, you don't need to sharpen your axe. Given a powerful internal combustion engine, you could drive a tank through the forest and fell many trees in rapid succession. But this strategy doesn't leave you with quality lumber, and leaves a huge mess for whoever comes after you (which may be your future self), and one day there won't be any trees left.

If your job is producing disposable software, be aware that you're using unpaid labour to do so. Some of the programmers who produced that AI's training data are struggling to eat and keep a roof over their heads. Act accordingly.

The 5min example is like a maximum/extreme case, yes. My average time spent writing each prompt is probably 30 seconds or less, and coding time saved per prompt like 25 to 60 minutes.

When I do spend minutes (not seconds) writing prompts, it's because I'm actually typing a "Context File" which describes with full clarity certain aspects of my architecture that are relevant to an Agent task set. This context file might have constraints and rules I want the Agent to follow; so I type it once and reference it from like 10 to 20 prompts perhaps. I also keep the prompt files as an archive for the future, so I can always go back and see what my original thoughts were. Also the context files help me do system documentation later.

>while entry level or junior devs basically only have one "Filter" by which they determine code quality which is: "Does the code seem to work?".

Based on my general experience with software over the last...30 years, most places must only have entry level and junior devs. Somehow despite 30 years of hardware improvement, basic software apps are still as clunky and slow as their '90s counterparts.

The only thing more reckless than a junior is an LLM-empowered junior.

> getting a 30x to 50x productivity gain

That is an absurd claim.

If you get a 30x gain then you're a 0.05x developer.

a 50x gain would literally mean you could get a year's worth of work done in a week. Preposterous.

Bad/dumb developers don't get much of a boost in my experience working with a plethora of shitty contractors. Good developers aren't getting a 30x boost I don't think, but they are getting more out of the tooling than bad developers.

The bottleneck is still finding good developers, even with the current generation of AI tooling in play.

It was when I started using Github Copilot in "Agent Mode" that my LLM productivity gains went from like 5x to 30x. People who are just using a chatbot get like 5x gains. People who use "Agent Mode" to write up a description of a new feature that would take several days by a human, but get it done in one click by an Agent, are getting 30x or more.

The amount of pushback I got on this thread tells me most devs simply haven't started using actual Agents yet.

I’ve tried using agents. LLMs just can’t reliably accomplish the tasks that I have to do. They just get shit wrong and hallucinate a ton. If I don’t break the task down into tiny chunks then they go off the rails.

This can definitely happen, because the context windows even in a great Agent can become flooded. I often do prompts like "Add a row of buttons at the top right named 'copy', 'cut', and 'paste'", and let the Agent do that, before I implement each button, for example.

The rule of thumb I've learned is to give an Agent the smallest possible task at a time, so there's zero ambiguity in the prompt, and context window is kept small.

One good prompt into Github Copilot 'Agent Mode' (running Claude 4) asking for a new feature can often result in up to 5 to 7 files being generated, and a total of 1000 lines of code being written. Your math is wrong. That's hours of work I didn't do, that only took me the time of describing the new feature with a paragraph of text.

It's ridiculous to equate lines of code to amount of engineering work or value.

A massive amount of valuable work can result in a few lines of code. Conversely a millions lines of code can be useless or even have negative value.

It's all about the quality of your prompts (i.e. your skill at writing clear unambiguous instructions with correct terminologies).

An experienced developer can generate tons of great code 30x faster with an Agent, with each function/module still being written using the least amount of code possible.

But you're right, the measure of good code isn't 'N', it's '1/N' (inverse), where N is number of lines of code to do something. The best code is [almost] always that with the least amount of lines, as long as you haven't sacrificed readability in order to remove lines, which I see a lot of juniors do. Rule of thumb is: "Least amount of easily understood LOC". If someone can't look at your code for the first time, and tell what it's doing, that's normally an indication it's not good code. Claude [almost] never breaks any of these rules.

> Claude [almost] never breaks any of these rules.

Well it does for me, frequently. An example is here: https://news.ycombinator.com/item?id=44126962

Not sure how Claude frequently fails for you, but everybody I know says it rarely fails. I'm definitely not claiming it's perfect tho.

Did you look at the link I sent?

What's your point?

What tech stack are you using? It matters a lot what tech you are using when it comes to how effective the LLMs are.

I'm using VSCode with Github Copilot, which has an "Agent Mode". It proactively reads thru your project files to understand the project, but imo you still have to give it pretty precise instructions to get what you want.

[deleted]

30x-50x :)

Right, if you're getting that, experienced senior is a pretty wild stretch.

On the latest 'All In' Podcast today, one of the Besties Billionaires said coding productivity boost was like '20x to 50x'. Probably just reads my HN.