Very grateful to have had good self-learning, education, then mentoring before LLM's existed. And although they are such a boon to productivity I worry what they will do to my ability to think on my own long term.
Like reaching for your phone out of habit the moment you are bored, I don't want to need an LLM any time I am faced with a problem. I want to exercise my own brain. I feel as though my ability to reason without them has already began to degrade, my mind fogs more these days. I try to curb it by having conversations rather than just asking for solutions.
I don't care that the tool isn't going anywhere, but just like relying on calculators won't make you better at arithmetic, I don't think relying on LLM's will make you a better engineer.
I think if you are a self directed learner in general, who is drawn toward learning, it will magnify that tendency and lower the activation energy required to bootstrap new domains of knowledge.
But if you don’t like learning, and only do it because you have to, it will magnify that tendency and provide a way to avoid learning altogether.
We are likely to end up with a large subset of the population basically being meat puppets doing whatever their favorite flavor of LLM tells them to do.
Ugh I feel like this general topic had been beaten to death, every article inevitably references the same viral moments on Twitter and draws roughly the same conclusions.
But these articles get posted and upvoted cause we developers just eat that shit up (if I’m being honest I do at least, every time I see these kinds of posts I always smirk cause I know what the comments section is gonna be like).
Unfortunately it seems like the more you use AI, the dumber you become; I suspect these discussions will get more and more dramatic until we have two camps that don't even understand what the other side is saying anymore.
Unfortunately, no, not when you are dealing with user data. You don't own that and you don't deserve to be reckless with it just because you're paying for the hardware and bandwidth.
> Make a dumb thing, take your hands off the wheel, have fun
This is why we need licensing for software developers:
When you're building a service that has actual users, with actual data, and tangible consequences when it fails, "take your hands off the wheel, have fun" is fundamentally dangerous.
Or, to put it differently: It's totally fine for some kids to build a treehouse. They might even get hurt. But, when it comes to dams and bridges, there is a reason why the people who design those need to get a license.
I like how every reply to you is the same, nuance doesn't exist, and we're all working on missile guidance systems and pacemaker firmware.
There's such a wide range of software. There's plenty of space for an amateur to do some creative vibe coding. What's the point of the scolding and hand wringing?
Given the fact that the post mentions an actual company with actual users that was seemingly vibe coded, I don't think anyone pointing out that this is reckless is conflating anything here. It seems like some are better than others at reading from context though, clearly.
that’s clearly not the danger. make a dumb thing that takes user input (including PII or maybe other protected data), then put it online and charge people to use it without vetting it for security? No, let’s not encourage that.
Articles like this make one think of John Henry vs. the steam drill.
John Henry beat the steam drill, once, and died.
Soon, there was a better steam drill.
"Vibe coding" is only a few months old. ChatGPT was released less than three years ago.
The singularity is just getting started.
History of computer chess:
- 1957 - early programs that played chess very badly. Excessive optimism
- 1967 - programs that play amateur chess
- 1976 - first tournament win
- 1980s - top programs not much stronger, but now running on PCs.
- 1996 - first win against grandmaster
- 2005 - last lose by top program against grandmaster
- 2025 - all the good programs can trounce any human.
Chess is a bad example. Even a "stupid" computer that is sufficiently powerful can just brute-force-search its way to a win. There's nothing special here, it's basically just deeper and deeper search. Put another way. the limitation was always about sufficiently powerful hardware.
It seems a bit presumptuous that software and hardware would not evolve past May 2025 to improve watts/token over time, or whatever metric you choose. Consumer-grade GPUs didn't really arrive until 1995, and industry didn't really standardize OpenGL until the early 90s, consumer-grade GPUs didn't have OpenGL support until much later. Vulkan didn't come along until 2016. It's mostly an artificial limit that I can't buy a 4070 with 1TB of memory at Best Buy for $1200, or will be, in a year or two. I would expect watts/token to decrease by at least half by the end of the decade.
In a sense, yes, but my point is that it is not a given that making LLMs bigger and bigger will make them qualitatively much better than they currently are.
Did 1957 level plays a wrong move according to the rules of the games? Like moving a bishop horizontally? And randomly?
Don't forget that in, 1957, computer's performance was much lower than today's. I wonder how a 1957 approach would fare on today's computer after removing limitations based on past limitations?
How specifically is innovation often bad? Innovation, like scientific discovery, is merely an expansion of the veil of knowledge. Specific applications may be good or bad but knowledge and processes are neutral.
Every time I see someone on HN crowing about how great so-called "vibe" coding is, I can't help by think they must be doing the lowest, most basic types of coding.
I don't need AI to help me code. What I need AI to do is help me figure out new coding solutions. But all AI seems able to do is regurgitate things that other people have already done that it's ingested from the internet.
I'll ask AI how to do abc, within xyz parameters, with def available and ghi constraints. I typically get back one of two things:
1. A list of 20 steps to achieve abc that somewhere around the middle has a step that's the equivalent of "Then magic happens" or two to three steps that are entirely unrelated to one other or the project at hand.
2. A list of what should be 20 steps that suddenly ends at step 7, leaving the problem only half done.
Most frustrating is when the "AI" says to use $tool/$library, but $tool/$library is not available on the specified platform, or hasn't been updated since 2011 and no longer works. When I tell the AI this, it always responds with, "You are right, that tool is no longer available. Here's a list of even more broken steps you can take to work around it."
So far, for my coding needs, AI seems only able to regurgitate what's already been done and published by others. That's great, but there are search engines for that. I have novel problems, and until AI can actually live up to the "I" part of its name, it is worthless to me.
In my banking megacorp, despite having officially title of senior sw engineer, coding is maybe 10% of my time spent. And its the best, most creative part I actually enjoy. Why would I give up that? No real velocity gained even if all would be 1 click away in flawless production-ready state.
The real cruft of seniority is: processes, knowing right people and their buttons, politics, being there to fix obscure corner case production issues and so on. How can llm help me with that? It can't.
For code sweatshops they may be a blessing, for corporations drowning in regulations and internal abysmal labyrinths of their IT, not so much.
Lol, are you me? Also a senior developer at a financial institution. I've maybe coded like 1000 lines in the last 2 months. I just got a ticket recently that required code and it felt like a weight lifted off my shoulders to finally be able to put hands to keyboard again.
Dang hello, me too. I recently became tech lead at our fintech co, and the few days a month I get to code is like vacation for my mind. I still remember the good ol' days where nobody talked to me and I solved problems all day long.
Frustrated with the generated code slop being heralded by tech social media as the next "coming" for developers, I've written a piece on my frustrations with "vibe coding" and what steps beginners in tech should take in a world of AI-assisted software development.
I'm curious which part of the article led you to that conclusion? It seems to make a pretty reasonable distinction between vibe coding and general use AI in coding to me. It's clearly not hyping up vibe coding, or even presenting it in a positive light.
Experienced senior developers can spot and fix the slop instantly, while still getting a 30x productivity gain, while entry level or junior devs basically only have one "Filter" by which they determine code quality which is: "Does the code seem to work?".
Unfortunately "Slop" will appear to work enough of the time to fool a Junior.
Also the reason Junior devs get "slop" is because their prompts are "slop". They don't know all the right terminologies for things, nor do they even have the writing/language skills necessary for good prompting.
EDIT: Due to everyone checking my math I corrected this to 30x, as what's provable, from past experience.
I can easily get a month of work done in a single day yes. So probably the 30x is about the current max, and 50x was hyperbole, because I didn't add it up before doing that post.
I just don't believe this. It's weird; I just don't know where folks are getting these extreme productivity gains from.
For example, the other day I asked a major LLMs to generate a simple markdown viewer with automatic section indentation for me in Node.js. The basic code worked after a few additional prompts from me.
Now I wanted folding. That was also done by the LLM. And then when I tried to add a few additional simples features, things fell apart. There were one or two seemingly simple runtime errors that the LLM was unable to fix after almost 10 tries.
I could fix it if I started digging inside the code, but then the productivity gains would start to slip away.
I'll spend like 10 minutes crafting a prompt that explains a new feature to be added to my app. I explain it in enough detail, with zero ambiguity, such that any human [senior] developer could do it. Often the result is 100s of lines of code generated, and well over 95% of the time the code "Claude 4" generates is exactly what I wanted.
I'm using VSCode Github Copilot in "Agent Mode", btw. It's able to navigate around an entire project, understand it, and work on it. You just lean back and watch it open files, edit them, show you in realtime what it's thought process is, as it does everything, etc. etc. It's truly like magic.
Any other way of doing development, in 2025, is like being in the stone ages.
Your response does not address the example I gave. Sure, if what you are doing is a variation on something that's been done to death, then an LLM is faster at cutting and gluing boilerplate together across multiple files.
Anything beyond that and LLMs require a lot of hand holding, and frequently regress to boot
I can't tell you how many times I've seen people write shoddy ambiguous prompts and then blame the LLM for not being able to read their minds.
If you write a prompt with perfect specificity as to what you want done, an agent like "Github Copilot+Claude" can work at about the same level as a senior dev. I do it all day long. It writes complex SQL, complex algorithms, etc.
Saying it only does boilerplate well reminds me of my mother who was brainwashed by a PBS TV show into thinking LLMs can only finish sentences they've seen before and cannot reason thru things.
You're still talking past my points. Look at the example I gave. Does it seem like the problem was due to an ambiguous prompt?
Even if my prompt was ambiguous, the LLM has no excuse producing code that does not type-check, or crashes in an obvious way when run. The ambiguity should affect what the code tries to do, not it's basic quality.
And your use of totalizing adjectives like "zero ambiguity" and "perfect specificity" tells me your arguments are somewhat suspect. There's nothing like "zero" and "perfect" as far as architecturing and implementing code goes.
When it comes to zero ambiguity and perfect specificity here's how I define it: If I gave the same exact prompt wording to a human would there be any questions they'd need to ask me before starting the work? If they need to ask a clarifying question before starting then I wasn't clear, otherwise I was clear. If you want to balk at phrases like "perfectly clear" you're just nit picking at semantics.
Actual code review is very slow. More often than not, you just looking for glaring mistakes, not that the code actually respect the specifications. Which results in the LGTM comment. Because you trust the other person's experience. In very critical system, change is very slow to get in.
Ruling out or refining an approach on the grounds it’s unlikely to lead to a suitable outcome (fixing and removing slop) is not the same as saying this code or approach represents a good enough outcome given what we currently know about the constraints of the problem (code review)
>while entry level or junior devs basically only have one "Filter" by which they determine code quality which is: "Does the code seem to work?".
Based on my general experience with software over the last...30 years, most places must only have entry level and junior devs. Somehow despite 30 years of hardware improvement, basic software apps are still as clunky and slow as their '90s counterparts.
Bad/dumb developers don't get much of a boost in my experience working with a plethora of shitty contractors. Good developers aren't getting a 30x boost I don't think, but they are getting more out of the tooling than bad developers.
The bottleneck is still finding good developers, even with the current generation of AI tooling in play.
It was when I started using Github Copilot in "Agent Mode" that my LLM productivity gains went from like 5x to 30x. People who are just using a chatbot get like 5x gains. People who use "Agent Mode" to write up a description of a new feature that would take several days by a human, but get it done in one click by an Agent, are getting 30x or more.
The amount of pushback I got on this thread tells me most devs simply haven't started using actual Agents yet.
I’ve tried using agents. LLMs just can’t reliably accomplish the tasks that I have to do. They just get shit wrong and hallucinate a ton. If I don’t break the task down into tiny chunks then they go off the rails.
This can definitely happen, because the context windows even in a great Agent can become flooded. I often do prompts like "Add a row of buttons at the top right named 'copy', 'cut', and 'paste'", and let the Agent do that, before I implement each button, for example.
The rule of thumb I've learned is to give an Agent the smallest possible task at a time, so there's zero ambiguity in the prompt, and context window is kept small.
One good prompt into Github Copilot 'Agent Mode' (running Claude 4) asking for a new feature can often result in up to 5 to 7 files being generated, and a total of 1000 lines of code being written. Your math is wrong. That's hours of work I didn't do, that only took me the time of describing the new feature with a paragraph of text.
It's all about the quality of your prompts (i.e. your skill at writing clear unambiguous instructions with correct terminologies).
An experienced developer can generate tons of great code 30x faster with an Agent, with each function/module still being written using the least amount of code possible.
But you're right, the measure of good code isn't 'N', it's '1/N' (inverse), where N is number of lines of code to do something. The best code is [almost] always that with the least amount of lines, as long as you haven't sacrificed readability in order to remove lines, which I see a lot of juniors do. Rule of thumb is: "Least amount of easily understood LOC". If someone can't look at your code for the first time, and tell what it's doing, that's normally an indication it's not good code. Claude [almost] never breaks any of these rules.
Whenever I can't just sit down and bash out code, it's because the design is wrong. These models are bad at design. I don't see where your 30×–50× could possibly come from.
Most of the times, the only reason I have the code open is to read it. If not for the huge amount of code, I could just print it out and go on my sofa.
If I'm dealing with a difficult to implement algorithm, a whiteboard is a better help than bashing out code.
That 30x math simply comes from spending 5min typing a prompt, and getting code generated that would take a human 2.5hrs to write. This means in the future most of a developer's time will be spent reviewing code, rather than typing it. Because AI will also be able to write the test cases too, so that effort [mostly] vanishes as well.
Unless your job is producing disposable software (e.g. single-use mobile games for short marketing campaigns), this comment suggests you don't know how to do your job. If a piece of the program takes 5 minutes to describe, but 2½ hours to write, you're spending your time in the wrong place, producing code that's legacy almost on day 1. Quoth https://quoteinvestigator.com/2014/03/29/sharp-axe/:
> The text presents to the wood cutter the alternative either to spend time in sharpening his axe, or expend his strength in using a dull one. Which shall he do? Wisdom is profitable to direct.
Sure, you don't need to sharpen your axe. Given a powerful internal combustion engine, you could drive a tank through the forest and fell many trees in rapid succession. But this strategy doesn't leave you with quality lumber, and leaves a huge mess for whoever comes after you (which may be your future self), and one day there won't be any trees left.
If your job is producing disposable software, be aware that you're using unpaid labour to do so. Some of the programmers who produced that AI's training data are struggling to eat and keep a roof over their heads. Act accordingly.
The 5min example is like a maximum/extreme case, yes. My average time spent writing each prompt is probably 30 seconds or less, and coding time saved per prompt like 25 to 60 minutes.
When I do spend minutes (not seconds) writing prompts, it's because I'm actually typing a "Context File" which describes with full clarity certain aspects of my architecture that are relevant to an Agent task set. This context file might have constraints and rules I want the Agent to follow; so I type it once and reference it from like 10 to 20 prompts perhaps. I also keep the prompt files as an archive for the future, so I can always go back and see what my original thoughts were. Also the context files help me do system documentation later.
Very grateful to have had good self-learning, education, then mentoring before LLM's existed. And although they are such a boon to productivity I worry what they will do to my ability to think on my own long term.
Like reaching for your phone out of habit the moment you are bored, I don't want to need an LLM any time I am faced with a problem. I want to exercise my own brain. I feel as though my ability to reason without them has already began to degrade, my mind fogs more these days. I try to curb it by having conversations rather than just asking for solutions.
I don't care that the tool isn't going anywhere, but just like relying on calculators won't make you better at arithmetic, I don't think relying on LLM's will make you a better engineer.
I think if you are a self directed learner in general, who is drawn toward learning, it will magnify that tendency and lower the activation energy required to bootstrap new domains of knowledge.
But if you don’t like learning, and only do it because you have to, it will magnify that tendency and provide a way to avoid learning altogether.
We are likely to end up with a large subset of the population basically being meat puppets doing whatever their favorite flavor of LLM tells them to do.
Oh totally agree. These LLMs are like tiktok on my phone in terms of addictiveness.
Ugh I feel like this general topic had been beaten to death, every article inevitably references the same viral moments on Twitter and draws roughly the same conclusions.
But these articles get posted and upvoted cause we developers just eat that shit up (if I’m being honest I do at least, every time I see these kinds of posts I always smirk cause I know what the comments section is gonna be like).
Unfortunately it seems like the more you use AI, the dumber you become; I suspect these discussions will get more and more dramatic until we have two camps that don't even understand what the other side is saying anymore.
Appreciate this. But.
Do whatever you want. That’s an option too.
Make a dumb thing, take your hands off the wheel, have fun. It’s your computer.
Unfortunately, no, not when you are dealing with user data. You don't own that and you don't deserve to be reckless with it just because you're paying for the hardware and bandwidth.
The problem experienced folks are worried about is not concerning what kids do in their basement with the ‘puter they got for Christmas.
> Make a dumb thing, take your hands off the wheel, have fun
This is why we need licensing for software developers:
When you're building a service that has actual users, with actual data, and tangible consequences when it fails, "take your hands off the wheel, have fun" is fundamentally dangerous.
Or, to put it differently: It's totally fine for some kids to build a treehouse. They might even get hurt. But, when it comes to dams and bridges, there is a reason why the people who design those need to get a license.
Projects just for you or some very, very limited set of lets say your close friends? Sure, who cares.
For folks actually workibg in companies handling various data that dont belong to them? Oh god, please no, thats a horrible advice.
Much like taking your hands off the wheel in a car, it's all fun and games and your choice until you crash into someone else.
I like how every reply to you is the same, nuance doesn't exist, and we're all working on missile guidance systems and pacemaker firmware.
There's such a wide range of software. There's plenty of space for an amateur to do some creative vibe coding. What's the point of the scolding and hand wringing?
People are conflating vibe coding for personal/hobbyist projects and vibe coding for production.
Evergreen tweet: https://knowyourmeme.com/photos/2659979-no-bitch-dats-a-whol...
Given the fact that the post mentions an actual company with actual users that was seemingly vibe coded, I don't think anyone pointing out that this is reckless is conflating anything here. It seems like some are better than others at reading from context though, clearly.
that’s clearly not the danger. make a dumb thing that takes user input (including PII or maybe other protected data), then put it online and charge people to use it without vetting it for security? No, let’s not encourage that.
Articles like this make one think of John Henry vs. the steam drill. John Henry beat the steam drill, once, and died. Soon, there was a better steam drill.
"Vibe coding" is only a few months old. ChatGPT was released less than three years ago. The singularity is just getting started.
History of computer chess:
- 1957 - early programs that played chess very badly. Excessive optimism
- 1967 - programs that play amateur chess
- 1976 - first tournament win
- 1980s - top programs not much stronger, but now running on PCs.
- 1996 - first win against grandmaster
- 2005 - last lose by top program against grandmaster
- 2025 - all the good programs can trounce any human.
LLMs are probably at the 1996 level now.
Chess is a bad example. Even a "stupid" computer that is sufficiently powerful can just brute-force-search its way to a win. There's nothing special here, it's basically just deeper and deeper search. Put another way. the limitation was always about sufficiently powerful hardware.
I'm not sure the same can be said about LLMs.
It seems a bit presumptuous that software and hardware would not evolve past May 2025 to improve watts/token over time, or whatever metric you choose. Consumer-grade GPUs didn't really arrive until 1995, and industry didn't really standardize OpenGL until the early 90s, consumer-grade GPUs didn't have OpenGL support until much later. Vulkan didn't come along until 2016. It's mostly an artificial limit that I can't buy a 4070 with 1TB of memory at Best Buy for $1200, or will be, in a year or two. I would expect watts/token to decrease by at least half by the end of the decade.
How do you not see its still just deeper and deeper search?
In a sense, yes, but my point is that it is not a given that making LLMs bigger and bigger will make them qualitatively much better than they currently are.
Did 1957 level plays a wrong move according to the rules of the games? Like moving a bishop horizontally? And randomly?
Don't forget that in, 1957, computer's performance was much lower than today's. I wonder how a 1957 approach would fare on today's computer after removing limitations based on past limitations?
> This is not innovation. This is a security breach waiting to happen.
No, it is innovation. The problem is that innovation is often bad.
How specifically is innovation often bad? Innovation, like scientific discovery, is merely an expansion of the veil of knowledge. Specific applications may be good or bad but knowledge and processes are neutral.
But applications can themselves be innovations. Innovation isn't a concept that's restricted only to abstract knowledge.
Innovation usually creates losers, who formerly benefited from the less efficient system in place.
Every time I see someone on HN crowing about how great so-called "vibe" coding is, I can't help by think they must be doing the lowest, most basic types of coding.
I don't need AI to help me code. What I need AI to do is help me figure out new coding solutions. But all AI seems able to do is regurgitate things that other people have already done that it's ingested from the internet.
I'll ask AI how to do abc, within xyz parameters, with def available and ghi constraints. I typically get back one of two things:
1. A list of 20 steps to achieve abc that somewhere around the middle has a step that's the equivalent of "Then magic happens" or two to three steps that are entirely unrelated to one other or the project at hand.
2. A list of what should be 20 steps that suddenly ends at step 7, leaving the problem only half done.
Most frustrating is when the "AI" says to use $tool/$library, but $tool/$library is not available on the specified platform, or hasn't been updated since 2011 and no longer works. When I tell the AI this, it always responds with, "You are right, that tool is no longer available. Here's a list of even more broken steps you can take to work around it."
So far, for my coding needs, AI seems only able to regurgitate what's already been done and published by others. That's great, but there are search engines for that. I have novel problems, and until AI can actually live up to the "I" part of its name, it is worthless to me.
In my banking megacorp, despite having officially title of senior sw engineer, coding is maybe 10% of my time spent. And its the best, most creative part I actually enjoy. Why would I give up that? No real velocity gained even if all would be 1 click away in flawless production-ready state.
The real cruft of seniority is: processes, knowing right people and their buttons, politics, being there to fix obscure corner case production issues and so on. How can llm help me with that? It can't.
For code sweatshops they may be a blessing, for corporations drowning in regulations and internal abysmal labyrinths of their IT, not so much.
Lol, are you me? Also a senior developer at a financial institution. I've maybe coded like 1000 lines in the last 2 months. I just got a ticket recently that required code and it felt like a weight lifted off my shoulders to finally be able to put hands to keyboard again.
Dang hello, me too. I recently became tech lead at our fintech co, and the few days a month I get to code is like vacation for my mind. I still remember the good ol' days where nobody talked to me and I solved problems all day long.
Frustrated with the generated code slop being heralded by tech social media as the next "coming" for developers, I've written a piece on my frustrations with "vibe coding" and what steps beginners in tech should take in a world of AI-assisted software development.
Nice vibe writing.
Love it. We’ve reached the no-true-scotsman part of the vibe coding hype speedrun.
I'm curious which part of the article led you to that conclusion? It seems to make a pretty reasonable distinction between vibe coding and general use AI in coding to me. It's clearly not hyping up vibe coding, or even presenting it in a positive light.
There’s an article?
[flagged]
Experienced senior developers can spot and fix the slop instantly, while still getting a 30x productivity gain, while entry level or junior devs basically only have one "Filter" by which they determine code quality which is: "Does the code seem to work?".
Unfortunately "Slop" will appear to work enough of the time to fool a Junior.
Also the reason Junior devs get "slop" is because their prompts are "slop". They don't know all the right terminologies for things, nor do they even have the writing/language skills necessary for good prompting.
EDIT: Due to everyone checking my math I corrected this to 30x, as what's provable, from past experience.
30x productivity gain? gtfo of here.
Most things I try to use it for, it has so many problems with its output that at most I get a 50% productivity gain after fixing everything.
I'm already super efficient at editing text with neovim so honestly for some tasks I end up with a productivity loss.
I can easily get a month of work done in a single day yes. So probably the 30x is about the current max, and 50x was hyperbole, because I didn't add it up before doing that post.
I just don't believe this. It's weird; I just don't know where folks are getting these extreme productivity gains from.
For example, the other day I asked a major LLMs to generate a simple markdown viewer with automatic section indentation for me in Node.js. The basic code worked after a few additional prompts from me.
Now I wanted folding. That was also done by the LLM. And then when I tried to add a few additional simples features, things fell apart. There were one or two seemingly simple runtime errors that the LLM was unable to fix after almost 10 tries.
I could fix it if I started digging inside the code, but then the productivity gains would start to slip away.
I'll spend like 10 minutes crafting a prompt that explains a new feature to be added to my app. I explain it in enough detail, with zero ambiguity, such that any human [senior] developer could do it. Often the result is 100s of lines of code generated, and well over 95% of the time the code "Claude 4" generates is exactly what I wanted.
I'm using VSCode Github Copilot in "Agent Mode", btw. It's able to navigate around an entire project, understand it, and work on it. You just lean back and watch it open files, edit them, show you in realtime what it's thought process is, as it does everything, etc. etc. It's truly like magic.
Any other way of doing development, in 2025, is like being in the stone ages.
Your response does not address the example I gave. Sure, if what you are doing is a variation on something that's been done to death, then an LLM is faster at cutting and gluing boilerplate together across multiple files.
Anything beyond that and LLMs require a lot of hand holding, and frequently regress to boot
I can't tell you how many times I've seen people write shoddy ambiguous prompts and then blame the LLM for not being able to read their minds.
If you write a prompt with perfect specificity as to what you want done, an agent like "Github Copilot+Claude" can work at about the same level as a senior dev. I do it all day long. It writes complex SQL, complex algorithms, etc.
Saying it only does boilerplate well reminds me of my mother who was brainwashed by a PBS TV show into thinking LLMs can only finish sentences they've seen before and cannot reason thru things.
You're still talking past my points. Look at the example I gave. Does it seem like the problem was due to an ambiguous prompt?
Even if my prompt was ambiguous, the LLM has no excuse producing code that does not type-check, or crashes in an obvious way when run. The ambiguity should affect what the code tries to do, not it's basic quality.
And your use of totalizing adjectives like "zero ambiguity" and "perfect specificity" tells me your arguments are somewhat suspect. There's nothing like "zero" and "perfect" as far as architecturing and implementing code goes.
When it comes to zero ambiguity and perfect specificity here's how I define it: If I gave the same exact prompt wording to a human would there be any questions they'd need to ask me before starting the work? If they need to ask a clarifying question before starting then I wasn't clear, otherwise I was clear. If you want to balk at phrases like "perfectly clear" you're just nit picking at semantics.
I've worked with some pretty smart people in my career and I've never met anyone who could do "instant" code review.
Actual code review is very slow. More often than not, you just looking for glaring mistakes, not that the code actually respect the specifications. Which results in the LGTM comment. Because you trust the other person's experience. In very critical system, change is very slow to get in.
Ruling out or refining an approach on the grounds it’s unlikely to lead to a suitable outcome (fixing and removing slop) is not the same as saying this code or approach represents a good enough outcome given what we currently know about the constraints of the problem (code review)
The "instant" to which you refer was meaning that I can tell instantly if the LLM generated what I wanted or not.
That doesn't mean it's reviewed, it means I'm accepting it to _BE_ what I go with and ultimately review.
>while entry level or junior devs basically only have one "Filter" by which they determine code quality which is: "Does the code seem to work?".
Based on my general experience with software over the last...30 years, most places must only have entry level and junior devs. Somehow despite 30 years of hardware improvement, basic software apps are still as clunky and slow as their '90s counterparts.
The only thing more reckless than a junior is an LLM-empowered junior.
30x-50x :)
Right, if you're getting that, experienced senior is a pretty wild stretch.
> getting a 30x to 50x productivity gain
That is an absurd claim.
If you get a 30x gain then you're a 0.05x developer.
a 50x gain would literally mean you could get a year's worth of work done in a week. Preposterous.
Bad/dumb developers don't get much of a boost in my experience working with a plethora of shitty contractors. Good developers aren't getting a 30x boost I don't think, but they are getting more out of the tooling than bad developers.
The bottleneck is still finding good developers, even with the current generation of AI tooling in play.
It was when I started using Github Copilot in "Agent Mode" that my LLM productivity gains went from like 5x to 30x. People who are just using a chatbot get like 5x gains. People who use "Agent Mode" to write up a description of a new feature that would take several days by a human, but get it done in one click by an Agent, are getting 30x or more.
The amount of pushback I got on this thread tells me most devs simply haven't started using actual Agents yet.
I’ve tried using agents. LLMs just can’t reliably accomplish the tasks that I have to do. They just get shit wrong and hallucinate a ton. If I don’t break the task down into tiny chunks then they go off the rails.
This can definitely happen, because the context windows even in a great Agent can become flooded. I often do prompts like "Add a row of buttons at the top right named 'copy', 'cut', and 'paste'", and let the Agent do that, before I implement each button, for example.
The rule of thumb I've learned is to give an Agent the smallest possible task at a time, so there's zero ambiguity in the prompt, and context window is kept small.
One good prompt into Github Copilot 'Agent Mode' (running Claude 4) asking for a new feature can often result in up to 5 to 7 files being generated, and a total of 1000 lines of code being written. Your math is wrong. That's hours of work I didn't do, that only took me the time of describing the new feature with a paragraph of text.
It's ridiculous to equate lines of code to amount of engineering work or value.
A massive amount of valuable work can result in a few lines of code. Conversely a millions lines of code can be useless or even have negative value.
It's all about the quality of your prompts (i.e. your skill at writing clear unambiguous instructions with correct terminologies).
An experienced developer can generate tons of great code 30x faster with an Agent, with each function/module still being written using the least amount of code possible.
But you're right, the measure of good code isn't 'N', it's '1/N' (inverse), where N is number of lines of code to do something. The best code is [almost] always that with the least amount of lines, as long as you haven't sacrificed readability in order to remove lines, which I see a lot of juniors do. Rule of thumb is: "Least amount of easily understood LOC". If someone can't look at your code for the first time, and tell what it's doing, that's normally an indication it's not good code. Claude [almost] never breaks any of these rules.
What tech stack are you using? It matters a lot what tech you are using when it comes to how effective the LLMs are.
> Claude [almost] never breaks any of these rules.
Well it does for me, frequently. An example is here: https://news.ycombinator.com/item?id=44126962
Not sure how Claude frequently fails for you, but everybody I know says it rarely fails. I'm definitely not claiming it's perfect tho.
Whenever I can't just sit down and bash out code, it's because the design is wrong. These models are bad at design. I don't see where your 30×–50× could possibly come from.
Most of the times, the only reason I have the code open is to read it. If not for the huge amount of code, I could just print it out and go on my sofa.
If I'm dealing with a difficult to implement algorithm, a whiteboard is a better help than bashing out code.
That 30x math simply comes from spending 5min typing a prompt, and getting code generated that would take a human 2.5hrs to write. This means in the future most of a developer's time will be spent reviewing code, rather than typing it. Because AI will also be able to write the test cases too, so that effort [mostly] vanishes as well.
Unless your job is producing disposable software (e.g. single-use mobile games for short marketing campaigns), this comment suggests you don't know how to do your job. If a piece of the program takes 5 minutes to describe, but 2½ hours to write, you're spending your time in the wrong place, producing code that's legacy almost on day 1. Quoth https://quoteinvestigator.com/2014/03/29/sharp-axe/:
> The text presents to the wood cutter the alternative either to spend time in sharpening his axe, or expend his strength in using a dull one. Which shall he do? Wisdom is profitable to direct.
Sure, you don't need to sharpen your axe. Given a powerful internal combustion engine, you could drive a tank through the forest and fell many trees in rapid succession. But this strategy doesn't leave you with quality lumber, and leaves a huge mess for whoever comes after you (which may be your future self), and one day there won't be any trees left.
If your job is producing disposable software, be aware that you're using unpaid labour to do so. Some of the programmers who produced that AI's training data are struggling to eat and keep a roof over their heads. Act accordingly.
The 5min example is like a maximum/extreme case, yes. My average time spent writing each prompt is probably 30 seconds or less, and coding time saved per prompt like 25 to 60 minutes.
When I do spend minutes (not seconds) writing prompts, it's because I'm actually typing a "Context File" which describes with full clarity certain aspects of my architecture that are relevant to an Agent task set. This context file might have constraints and rules I want the Agent to follow; so I type it once and reference it from like 10 to 20 prompts perhaps. I also keep the prompt files as an archive for the future, so I can always go back and see what my original thoughts were. Also the context files help me do system documentation later.