Hacker News

tptacek 5 days ago [ - ]

Learning how to use LLMs in a coding workflow is trivial. There is no learning curve. You can safely ignore them if they don’t fit your workflows at the moment.

I have never heard anybody successfully using LLMs say this before. Most of what I've learned from talking to people about their workflows is counterintuitive and subtle.

It's a really weird way to open up an article concluding that LLMs make one a worse programmer: "I definitely know how to use this tool optimally, and I conclude the tool sucks". Ok then. Also: the piano is a terrible, awful instrument; what a racket it makes.

credit_guy 4 days ago [ - ]

Fully agree. It takes months to learn how to use LLMs properly. There is an initial honeymoon where the LLMs blow your mind out. Then you get some disappointments. But then you start realizing that there are some things that LLMs are good at and some that they are bad at. You start creating a feel for what you can expect them to do. And more importantly, you get into the habit of splitting problems into smaller problems that the LLMs are more likely to solve. You keep learning how to best describe the problem, and you keep adjusting your prompts. It takes time.

physPop 4 days ago [ - ]

it really doesn't take that long. Maybe if you're super junior and never coded before? In that case I'm glad its helping you get into the field. Also, if its taking you months there are whole new models that will get released and you need to learn those quirks again.

gexla 3 days ago [ - ]

No, it's a practice. You're not necessarily building technical knowledge, rather you're building up an intuition. It's for sure not like learning a programming language. It's more like feeling your way along and figuring out how to inhabit a dwelling in the dark. We would just have to agree to disagree on this. I feel exactly as the parent commenter felt. But it's not easy to explain (or to understand from someones explanation.)

Our_Benefactors a day ago [ - ]

How very condescending of you.

gexla 3 days ago [ - ]

Love this, and it's so true. A lot of people don't get this, because it's so nuanced. It's not something that's slowing you down. It's not learning a technical skill. Rather, it's building an intuition.

I find it funny when people ask me if it's true that they can build an app using an LLM without knowing how to code. I think of this... that it took me months before I started feeling like I "got it" with fitting LLMs into my coding process. So, not only do you need to learn how to code, but getting to the point that the LLM feels like a natural extension of you has its own timeline on top.

nvbalaji 5 hours ago [ - ]

Spot on. I code for last 25+ years. It took me a while (say about a week) to start using it meaningfully. I would not still claim I am using it efficiently or have the most productive work flow, which I think is because of the fact I keep figuring out new techniques almost on a daily basis.

ruszki 3 days ago [ - ]

> There is an initial honeymoon where the LLMs blow your mind out.

What does this even mean?

In the first one and half years after ChatGPT released, when I used them there was a 100% rate, when they lied to me, I completely missed this honeymoon phase. The first time when it answered without problems was about 2 months ago. And that time was the first time when it answered one of them (ChatGPT) better than Google/Kagi/DDG could. Even yesterday, I tried to force Claude Opus to answer when is the next concert in Arena Wien, and it failed miserably. I tried other models too from Anthropic, and all failed. It successfully parsed the page of next events from the venue, then failed miserably. Sometimes it answered with events from the past, sometimes events in October. The closest was 21 August. When I asked what’s on 14 August, it said sorry, I’m right. When I asked about “events”, it simply ignored all of the movie nights. When I asked about them specifically, it was like I would have started a new conversation.

The only time when they made anything comparable to my code of quality was when they got a ton of examples of tests which looked almost the same. Even then, it made mistakes… when basically I had to change two lines, so copy pasting would have been faster.

There was an AI advocate here, who was so confident in his AI skill, that he showed something exact, which most of the people here try to avoid: recorded how he works with AIs. Here is the catch: he showed the same thing. There were already examples, he needed minimal modifications for the new code. And even then, copy pasting would have been quicker, and would have contained less mistakes… which he kept in the code, because it didn’t fail right away.

thefourthchime 4 days ago [ - ]

I'm glad you feel like you've nailed it. I've been using models to help me code for over two years, and I still feel like I have no idea what I'm doing.

I feel like every time I have a prompt or use a new tool, I'm experimenting with how to make fire for the first time. It's not to say that I'm bad at it. I'm probably better than most people. But knowing how to use this tool is by far the largest challenge, in my opinion.

throwawaybob420 4 days ago [ - ]

Months? That’s actually an insanely long time

otabdeveloper4 4 days ago [ - ]

I dunno, man. I think you could have spent that time, you know, learning to code instead.

credit_guy 3 days ago [ - ]

Sure. But it happens that I have 20 years of experience, and I know quite well how to code. Everything the LLM does for me I can do myself. But the LLM does that 100 times faster than me. Most of the days nowadays I push thousands of lines of code. And it's not garbage code, the LLMs write quite high quality code. Of course, I still have to go through the code and make sure it all makes sense. So I am still the bottleneck. At some point I will probably grown to trust the LLM, but I'm not quite there yet.

blks 3 days ago [ - ]

> Most of the days nowadays I push thousands of lines of code

Insane stuff. It’s clear you can’t review so much changes in a day, so you’re just flooding your code base with code that you barely read.

Or is your job just re-doing the same boilerplate over and over again?

credit_guy 3 days ago [ - ]

You are a bit quick to jump to conclusions. With LLMs, test driven development becomes both a necessity and a pleasure. The actual functional code I push in a day is probably in the low hundreds LOC’s. But I push a lot of tests too. And sure, lots of that is boilerplate. But the tests run, pass, and if anything have better coverage than when I was writing all the code myself.

otabdeveloper4 3 days ago [ - ]

If you have 20 years of experience, then you know that number of lines of codes is always inversely proportional to code quality.

> ...thousands of lines of code ... quite high quality

A contradiction in terms.

credit_guy 2 days ago [ - ]

Here’s an experiment for the two of us: we should both bookmark this page and revisit it one year from now. It is likely that at least one of us will see the world in a different way, or even both.

8note 3 days ago [ - ]

it is, mind you, exactly the same experience as working on a team with lots of junior engineers, and delegating work to them

gexla 3 days ago [ - ]

Wait a minute, you didn't just claim that we have reached AGI, right? I mean, that's what it would mean to delegate work to junior engineers, right? You're delegating work to human level intelligence. That's not what we have with LLMs.

credit_guy 3 days ago [ - ]

Yes and no. With junior developers you need to educate them. You need to do that with LLMs too. Maybe you need to break down the problem in smaller chunks, but you get to this after a while. But once the LLM understands the task, you get a few hundred lines of code in a mater of minutes. With a junior developer you are lucky if they come back the same day. The iteration speed with AI is simply in a different league.

Edit: it is Sunday. As I am relaxing, and spending time writing answers on HN, I keep a lazy eye on the progress of an LLM at work too. I got stuff done that would have taken me a few days of work by just clicking a "Continue" button now and then.

SkyPuncher 4 days ago [ - ]

> Learning how to use LLMs in a coding workflow is trivial. There is no learning curve. You can safely ignore them if they don’t fit your workflows at the moment.

That's a wild statement. I'm now extremely productive with LLMs in my core codebases, but it took a lot of practice to get it right and repeatable. There's a lot of little contextual details you need to learn how to control so the LLM makes the right choices.

Whenever I start working in a new code base, it takes a a non-trivial amount of time to ramp back up to full LLM productivity.

uvdn7 4 days ago [ - ]

Is the non-trivial amount of time significantly less than you trying to ramp up yourself?

I am still hesitant using AI for solving problems for me. Either it hallucinates and misleads me. Or it does a great job and I worry that my ability of reasoning through complex problems with rigor will degenerate. When my ability of solving complex problems degenerated, patience diminished, attention span destroyed, I will become so reliant on a service that other entities own to perform in my daily life. Genuine question - are people comfortable with this?

SkyPuncher 4 days ago [ - ]

The ramp-up time with AI is absolutely lower than trying to ramp up without AI.

My comment is specifically in contrast to working in a codebase where I'm at "max AI productivity". In a new codebase, it just takes a bit of time to work out kinks and figure out tendencies of the LLMs in those codebases. It's not that I'm slower than I'd be without AI, I'm just not at my "usual" AI-driven productivity levels.

theshrike79 2 days ago [ - ]

Don't use it as a solution machine.

You use it when you know how to do something and know exactly what the solution looks like, but can't be arsed to do it. Like most UI work where you just want something in there with the basic framework to update content etc. There's nothing challenging in doing it, you know what has to be done, but figuring out the weird-ass React footguns takes time. Most LLMs can one-shot it with enough information.

You can also use it as a rubber duck, ask it to analyse some code, read and see if you agree. Ask for improvements or modifications, read and see if you agree.

hiAndrewQuinn 4 days ago [ - ]

>Genuine question - are people comfortable with this?

It's a question of degree, but in general, yeah. I'm totally comfortable being reliant on other entities to solve complex problems for me.

That's how economies work [1]. I neither have nor want to acquire the lifetime of experience I would need to learn how to produce the tea leaves in my tea, or the clean potable water in it, or the mug they are contained within, or the concrete walls 50 meters up from ground level I am surrounded by, or so on and so forth. I can live a better life by outsourcing the need for this specialized knowledge to other people, and trade with them in exchange for my own increasingly-specialized knowledge. Even if I had 100 lifetimes to spend, and not the 1 I actually have, I would probably want to put most of them to things that, you know, aren't already solved-enough problems.

Everyone doing anything interesting works like this, with vanishingly few exceptions. My dad doesn't need to know how to do algebra to get his taxes done, he just has an accountant. And his accountant doesn't need to know how to rewire his turn of the century New England home. And if you look at the exceptions, like that really cute 'self sufficient' family who uploads weekly YouTube videos called "Our Homestead Life"... It often turns out that the revenue from that YouTube stream is nontrivial to keeping the whole operation running. In other words, even if they genuinely no longer go to Costco, it's kind of a gyp.

[1]: https://www.youtube.com/watch?v=67tHtpac5ws

misja111 4 days ago [ - ]

> My dad doesn't need to know how to do algebra to get his taxes done, he just has an accountant.

This is not quite the same thing. The AI is not perfect, it frequently makes mistakes or suboptimal code. As a software engineer, you are responsible for finding and fixing those. This means you have to review and fully understand everything that the AI has written.

Quite a different situation than your dad and his accountant.

hiAndrewQuinn 4 days ago [ - ]

I see your point. I don't think it's different in kind, just degree. My thought process: First, is my dad's accountant infallible?

If not, then they must themselves make mistakes or do things suboptimally sometimes. Whose responsibility is that - my dad, or my dad's accountant?

If it is my dad, does that then mean my dad has an obligation to review and fully understand everything the accountant has written?

And do we have to generalize that responsibility to everything and everyone my dad has to hand off work to in order to get something done? Clearly not, that's absurd. So where do we draw the line? You draw it in the same place I do for right now, but I don't see why we expect that line to be static.

johnisgood 4 days ago [ - ]

> This means you have to review and fully understand everything that the AI has written.

Yes, and people who care and is knowledgeable do this already. I do this, for one.

skajzbxbbj 3 days ago [ - ]

But there’s no way one is giving as thorough a review as if one had written code to solve the problem themselves. Writing is understanding. You’re trading thoroughness and integrity for chance.

Writing code should never have been a bottle neck. And since it wasn’t, any massive gains are due to being ok with trusting the AI.

r_lee 4 days ago [ - ]

I would honestly say, it's more like autocomplete on steroids, like you know what you want so you just don't wanna type it out (e.g. scripts and such)

And so if you don't use it then someone else will... But as for the models, we already have some pretty good open source ones like Qwen and it'll only get better from here so I'm not sure why the last part would be a dealbreaker

deadbabe 4 days ago [ - ]

He’s not wrong.

Getting 80% of the benefit of LLMs is trivial. You can ask it for some functions or to write a suite of unit tests and you’re done.

The last 20%, while possible to attain, is ultimately not worth it for the amount of time you spend in context hells. You can just do it yourself faster.

SkyPuncher 4 days ago [ - ]

> The last 20%, while possible to attain, is ultimately not worth it for the amount of time you spend in context hells. You can just do it yourself faster.

I'm arguing that there's a skill that has to be learned in order to break through this. As you start in a new code base, you should be quick to jump in when you hit that 20%. But, as you spend more time in it, you learn how to avoid the same "context hell" issues and move that number down to 15%, 10%, 5% of the time.

You're still going to need to jump in, but when you can learn to get the LLM to write 95% of the code for you, that's incredibly powerful.

deadbabe 4 days ago [ - ]

It’s not incredibly powerful, it’s incrementally powerful. Getting the first 80% via LLM is already the incredible power. A sufficiently skilled developer should be able to handle the rest with ease. It is not worth doing anything unnatural in an effort to chase down the last 20%, you are just wasting time and atrophying skills. If you can get full 95% in some one shot prompts, great. But don’t go chasing waterfallls.

SkyPuncher 4 days ago [ - ]

No, it actually has an exponential growth type of effect on productivity to be able to push it to the boundary more.

I’m making this a bit contrived, but I’m simplifying it to demonstrate the underlying point.

When an LLM is 80% effect, I’m limits to doing 5 things in parallel since I still need to jump in 20% of the time.

When an LLM is 90% effect, I can do 10 things at once. When it’s 95%, 20 things. 99%, 100 things.

Now, obviously I can’t actually juggle 10 or 20 things at once. However, the point is there are actually massive productivity gains to be had when you can reduce your involvement in a task from 20% to, even 10%. You’re effectively 2x as productive.

deadbabe 4 days ago [ - ]

I’d bet you don’t even have 2 or 3 things to do at once, much less 100. So it’s pointless to chase those types of coverages.

Jensson 4 days ago [ - ]

Do you understand what parallel means? Most LLM responds in seconds, there is no parallel work for you to do there.

Or do you mean you are using long running agents to do tasks and then review those? I haven't seen such a workflow be productive so far.

SkyPuncher 3 days ago [ - ]

I run through a really extensive planning step that generates technical architecture and iterative tasks. I then send an LLM along to implement each step, debugging, iterative, and verifying it's work. It's not uncommon for it to take a non-trivial amount of time to complete a step (5+ minutes).

Right now, I still need to intervene enough that I'm not actually doing a second coding project in parallel. I tend to focus on communication, documentation, and other artifacts that support the code I'm writing.

However, I am very close to hitting that point and occasionally do on easier tasks. There's a _very_ real tipping point in productivity when you have confidence that an LLM can accomplish a certain task without your intervention. You can start to do things legitimately in parallel when you're only really reviewing outputs and doing minor tweaks.

lelanthran 3 days ago [ - ]

> 'm arguing that there's a skill that has to be learned in order to break through this. As you start in a new code base, you should be quick to jump in when you hit that 20%. But, as you spend more time in it, you learn how to avoid the same "context hell" issues and move that number down to 15%, 10%, 5% of the time.

The problem is that you're learning a skill that will need refinement each time you switch to a new model. You will redo some of this learning on each new model you use.

This actually might not be a problem anyway, as all the models seem to be converging asymptotically towards "programming".

The better they do on the programming benchmarks, the further away from AGI they get.

physPop 4 days ago [ - ]

exactly. people delude themselves thinking this is productivity. Tweaking prompts is to get it "right" is very wasteful.

majormajor 4 days ago [ - ]

> That's a wild statement. I'm now extremely productive with LLMs in my core codebases, but it took a lot of practice to get it right and repeatable. There's a lot of little contextual details you need to learn how to control so the LLM makes the right choices.

> Whenever I start working in a new code base, it takes a a non-trivial amount of time to ramp back up to full LLM productivity.

Do you find that these details translate between models? Sounds like it doesn't translate across codebases for you?

I have mostly moved away from this sort of fine-tuning approach because of experience a while ago around OpenAI's ChatGPT 3.5 and 4. Extra work on my end necessary with the older model wasn't with the new one, and sometimes counterintuitively caused worse performance by pointing it at what the way I'd do it vs the way it might have the best luck with. ESPECIALLY for the sycophantic models which will heavily index on "if you suggested that this thing might be related, I'll figure out some way to make sure it is!"

So more recently I generally stick to the "we'll handle a lot of the prompt nitty gritty" for you IDE or CLI agent stuff, but I find they still fall apart with large complex codebases and also that the tricks don't translate across codebases.

SkyPuncher 4 days ago [ - ]

Yes and no. The broader business context translates well, but each model has it's own blindspots and hyperfocuses that you need to massage out.

* Business context - these are things like code quality/robustness, expected spec coverage, expected performance needs, domain specific knowledge. These generally translate well between models, but can vary between code bases. For example, a core monolith is going to have higher standards than a one-off auxiliary service.

* Model focuses - Different models have different tendencies when searching a code base and building up their context. These are specific to each code base, but relatively obvious when they happen. For example, in one code base I work in, one model always seems to pick up our legacy notification system while another model happens to find our new one. It's not really a skill issue. It's just luck of the draw how files are named and how each of them search. They each just find a "valid" notification pattern in a different order.

LLMs are massively helpful for orienting to a new codebase, but it just takes some time to work out those little kinks.

ModernMech 4 days ago [ - ]

This is like UB in compilers but 100x worse, because there's no spec, it's not even documented, and it could change without a compiler update.

tptacek 4 days ago [ - ]

It is nothing at all like UB in a compiler. UB creates invisible bugs that tend to be discovered only after things have shipped. This is code generation. You can just read the code to see what it does, which is what most professionals using LLMs do.

ModernMech 3 days ago [ - ]

With the volume of code people are generating, no you really can't just read it all. pg recently posted [1] that someone he knows is generating 10kloc/day now. There's no way people are using AI to generate that volume of code and reading it. How many invisible bugs are lurking in that code base, waiting to be found some time in the future after the code has shipped?

[1] https://x.com/paulg/status/1953289830982664236

tptacek 3 days ago [ - ]

I read every line I generate and usually adjust things; I'm uncomfortable merging a PR I haven't put my fingerprints on somehow. From the conversations I have with other practitioners, I think this is pretty normal. So, no, I reject your premise.

ModernMech 3 days ago [ - ]

My premise didn't have anything to do with you, so what you do isn't a basis for rejecting it. No matter what you or your small group of peers do, AI is generating code at a volume that all the developers in the world combined couldn't read if they dedicated 24hrs/day.

skajzbxbbj 3 days ago [ - ]

[dead]

troupo 5 days ago [ - ]

> I have never heard anybody successfully using LLMs say this before. Most of what I've learned from talking to people about their workflows is counterintuitive and subtle.

Because for all our posturing about being skeptical and data driven we all believe in magic.

Those "counterintuitive non-trivial workflows"? They work about as well as just prompting "implement X" with no rules, agents.md, careful lists etc.

Because 1) literally no one actually measures whether magical incarnations work and 2) it's impossible to make such measurements due to non-determinism

simonw 4 days ago [ - ]

The problem with your argument here is that you're effectively saying that developers (like myself) who put effort into figuring out good workflows for coding with LLMs are deceiving themselves, and are effectively wasting their time.

Either I've wasted significant chunks of the past ~3 years of my life or you're missing something here. Up to you to decide which you believe.

I agree that it's hard to take solid measurements due to non-determinism. The same goes for managing people, and yet somehow many good engineering managers can judge if their team is performing well and figure out what levers they can pull to help them perform better.

hitarpetar 4 days ago [ - ]

That's not a problem, that is the argument. People are bad at measuring their own productivity. Just because you feel more productive with an LLM does not mean you are. We need more studies and less anecdata

simonw 4 days ago [ - ]

I'm afraid all you're going to get from me is anecdata, but I find a lot of it very compelling.

I talk to extremely experienced programmers whose opinions I have valued for many years before the current LLM boom who are now flying with LLMs - I trust their aggregate judgement.

Meanwhile my own https://tools.simonwillison.net/colophon collection has grown to over 120 in just a year and a half, most of which I wouldn't have built at all - and that's a relatively small portion of what I've been getting done with LLMs elsewhere.

Hard to measure productivity on a "wouldn't exist" to "does exist" scale.

kiitos 11 hours ago [ - ]

Every time you post about this stuff you get at least as much pushback as you get affirmation, and yet when you discuss anything related to peer responses, you never seem to mention or include any of that negative feedback, only the positive...

simonw 11 hours ago [ - ]

I don't get it, what are you asking me to do here?

You want me to say "this stuff is really useful, here's why I think that. But lots of people on the internet have disagreed with me, here's links to their comments"?

0points 4 days ago [ - ]

> my own https://tools.simonwillison.net/colophon collection has grown to over 120

What in the wooberjabbery is this even.

List of single-commit LLM generated stuff. Vibe coded shovelware like animated-rainbow-border [1] or unix-timestamp [2].

Calling these tools seems to be overstating it.

1: https://gist.github.com/simonw/2e56ee84e7321592f79ceaed2e81b...

2: https://gist.github.com/simonw/8c04788c5e4db11f6324ef5962127...

simonw 4 days ago [ - ]

Cool right? It's my playground for vibe coded apps, except I started it nearly a year before the term "vibe coding" was introduced.

I wrote more about it here: https://simonwillison.net/2024/Oct/21/claude-artifacts/ - and a lot of them have explanations in posts under my tools tag: https://simonwillison.net/tags/tools/

It might also be the largest collection of published chat transcripts for this kind of usage from a single person - though that's not hard since most people don't publish their prompts.

Building little things like this is really effective way of gaining experience using prompts to get useful code results out of LLMs.

0points 3 days ago [ - ]

> Cool right?

100s of single commit AI generated trash in the likes of "make the css background blue".

On display.

Like it's something.

You can't be serious.

bgwalter 4 days ago [ - ]

[flagged]

simonw 4 days ago [ - ]

I've been using LLM-assistance for my larger open source projects - https://github.com/simonw/datasette https://github.com/simonw/llm and https://github.com/simonw/sqlite-utils - for a couple of years now.

Also literally hundreds of smaller plugins and libraries and CLI tools, see https://github.com/simonw?tab=repositories (now at 880 repos, though a few dozen of those are scrapers and shouldn't count) and https://pypi.org/user/simonw/ (340 published packages).

Unlike my tools.simonwillison.net stuff the vast majority of those products are covered by automated tests and usually have comprehensive documentation too.

What do you mean by my script?

rkomorn 4 days ago [ - ]

The whole debate about LLMs and productivity consistently brings the "don't confuse movement with progress" warning to my mind.

But it was already a warning before LLMs because, as you wrote, people are bad at measuring productivity (among many things).

tptacek 4 days ago [ - ]

Another problem with it is that you could have said the same thing about virtually any advancement in programming over the last 30 years.

Tainnor 4 days ago [ - ]

There have been so many "advances" in software development in the last decades - powerful type systems, null safety, sane error handling, Erlang-style fault tolerance, property testing, model checking, etc. - and yet people continue to write garbage code in unsafe languages with underpowered IDEs.

I think many in the industry have absolutely no clue what they're doing and are bad at evaluating productivity, often prioritising short term delivery over longterm maintenance.

LLMs can absolutely be useful but I'm very concerned that some people just use them to churn out code instead of thinking more carefully about what and how to build things. I wish we had at least the same amount of discussions about those things I mentioned above as we have about whether Opus, Sonnet, GPT5 or Gemini is the best model.

Karrot_Kream 4 days ago [ - ]

> I wish we had at least the same amount of discussions about those things I mentioned above as we have about whether Opus, Sonnet, GPT5 or Gemini is the best model.

I mean we do. I think programmers are more interested in long term maintainable software than its users are. Generally that makes sense, a user doesn't really care how much effort it takes to add features or fix bugs, these are things that programmers care about. Moreover the cost of mistakes of most software is so low that most people don't seem interested in paying extra for more reliable software. The few areas of software that require high reliability are the ones regulated or are sold by companies that offer SLAs or other such reliability agreements.

My observation over the years is that maintainability and reliability are much more important to programmers who comment in online forums than they are to users. It usually comes with the pride of work that programmers have but my observation is that this has little market demand.

Tainnor 4 days ago [ - ]

Users definitely care about things like reliability when they're using actually important software (which probably excludes a lot of startup junk). They may not be able to point to what causes issues, but they obviously do complain when things are buggy as hell.

troupo 3 days ago [ - ]

> I think programmers are more interested in long term maintainable software than its users are.

Please talk to your users

troupo 4 days ago [ - ]

> who put effort into figuring out good workflows for coding with LLMs are deceiving themselves, and are effectively wasting their time.

It's quite possible you do. Do you have any hard data justifying the claims of "this works better", or is it just a soft fuzzy feeling?

> The same goes for managing people, and yet somehow many good engineering managers can judge if their team is performing well

It's actually really easy to judge if a team is performing well.

What is hard is finding what actually makes the team perform well. And that is just as much magic as "if you just write the correct prompt everything will just work"

---

wait. why are we fighting again? :) https://dmitriid.com/everything-around-llms-is-still-magical...

KallDrexx 4 days ago [ - ]

I'm not the OP and I"m not saying you are wrong, but I am going to point out that the data doesn't necessarily back up significant productivity improvements with LLMs.

In this video (https://www.youtube.com/watch?v=EO3_qN_Ynsk) they present a slide by the company DX that surveyed 38,880 developers across 184 organizations, and found the surveyed developers claiming a 4 hour average time savings per developer per week. So all of these LLM workflows are only making the average developer 10% more productive in a given work week, with a bunch of developers getting less. Few developers are attaining productivity higher than that.

In this video by stanford researchers actively researching productivity using github commit data for private and public repositories (https://www.youtube.com/watch?v=tbDDYKRFjhk) they have a few very important data points in there:

1. There's zero correlation they've found between how productive respondants claim their productivity is and how it's actually measured, meaning people are poor judges of their own productivity numbers. This does refute the claims on the previous point I made but only if you assume people are wildly more productive then they claim on average.

2. They have been able to measure actual increase in rework and refactoring commits in the repositories measured as AI tools become more in use in those organizations. So even with being able to ship things faster, they are observing increase number of pull requests that need to fix those previous pushes.

3. They have measured that greenfield low complexity systems have pretty good measurements for productivity gains, but once you get more towards higher complexity systems or brownfield systems they start to measure much lower productivity gains, and even negative productivity with AI tools.

This goes hand in hand with this research paper: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o... which had experienced devs in significant long term projects lose productivity when using AI tools, but also completely thought the AI tools were making them even more productivity.

Yes, all of these studies have their flaws and nitpicks we can go over that I'm not interested in rehashing. However, there's a lot more data and studies that show AI having very marginal productivity boost compared to what people claim than vice versa. I'm legitimately interested in other studies that can show significant productivity gains in brownfield projects.

CuriouslyC 4 days ago [ - ]

So far I've found that the people who are hating on AI are stuck maintaining highly coupled that they've invested a significant amount of mental energy internalizing. AI is bad on that type of code, and since they've invested so much energy on understanding the code, it ends up taking longer for them to load context and guide the AI than to just do the work. Their code base is hot coupled garbage, and rather than accept that the tools aren't working because of their own lack of architectural rigor, they just shit on the tools. This is part of the reason that that study of open source maintainers using Cursor didn't consistently produce improvement (also, Cursor is pretty mid).

https://www.youtube.com/watch?v=tbDDYKRFjhk&t=4s is one of the largest studies I've seen so far and it shows that when the codebase is small or engineered for AI use, >20% productivity improvements are normal.

roxolotl 4 days ago [ - ]

On top of this a lot of the “learning to work with LLMs” is breaking down tasks into small pieces with clear instructions and acceptance criteria. That’s just part of working efficiently but maybe don’t want to be bothered to do it.

georgeburdell 4 days ago [ - ]

Working efficiently as a team, perhaps, but during solo development this is unnecessary beyond the extent that is necessary to document the code

tptacek 4 days ago [ - ]

Even this opens up a whole field of weird subtle workflow tricks people have, because people run parallel asynchronous agents that step on each other in git. Solo developers run teams now!

Really wild to hear someone say out loud "there's no learning curve to using this stuff".

troupo 4 days ago [ - ]

The "learning curve" is reading "experts opinion" on the ever-changing set of magical rituals that may or may not work but trust us it works.

azan_ 4 days ago [ - ]

No, you do not need to trust anyone, you can just verify what works and what doesn't, it's very easy.

troupo 4 days ago [ - ]

Indeed. And it's extremely easy to verify my original comment: https://news.ycombinator.com/item?id=44849887

prerok 5 days ago [ - ]

I agree with your assessment about this statement. I actually had to reread it a few times to actually understand it.

He is actually recommending Copilot for price/performance reasons and his closing statement is "Don’t fall for the hype, but also, they are genuinely powerful tools sometimes."

So, it just seems like he never really gave a try at how to engineer better prompts that these more advanced models can use.

rocqua 4 days ago [ - ]

The OPs point seems to be: it's very quick for LLMs to be a net benefit to your skills, if it is a benefit at all. That is, he's only speaking of the very beginning part of the learning curve.

edfletcher_t137 5 days ago [ - ]

The first two points directly contradict each other, too. Learning a tool should have the outcome that one is productive with it. If getting to "productive" is non-trivial, then learning the tool is non-trivial.

enraged_camel 3 days ago [ - ]

Agreed. This is an astonishingly bad article. It's clear that the only reason it made it to the front page is because people who view AI with disdain or hatred upvoted it. Because as you say: how can anyone make authoritative claims about a set of tools not just without taking the time to learn to use them properly, but also believing that they don't even need to bother?

hislaziness 2 days ago [ - ]

Would it be more appropriate to compare LLMs to Autotunes rather than pianos?

lordnacho 4 days ago [ - ]

I've said it before, I feel like I'm some sort of lottery winner when it comes to LLM usage.

I've tried a few things that have mostly been positive. Starting with copilot in-line "predictive text on steroids" which works really well. It's definitely faster and more accurate than me typing on a traditional intellisense IDE. For me, this level of AI is cant-lose: it's very easy to see if a few lines of prediction is what you want.

I then did Cursor for a while, and that did what I wanted as well. Multi-file edits can be a real pain. Sometimes, it does some really odd things, but most of the time, I know what I want, I just don't want to find the files, make the edits on all of them, see if it compiles, and so on. It's a loop that you have to do as a junior dev, or you'll never understand how to code. But now I don't feel I learn anything from it, I just want the tool to magically transform the code for me, and it does that.

Now I'm on Claude. Somehow, I get a lot fewer excursions from what I wanted. I can do much more complex code edits, and I barely have to type anything. I sort of tell it what I would tell a junior dev. "Hey let's make a bunch of connections and just use whichever one receives the message first, discarding any subsequent copies". If I was talking to a real junior, I might answer a few questions during the day, but he would do this task with a fair bit of mess. It's a fiddly task, and there are assumptions to make about what the task actually is.

Somehow, Claude makes the right assumptions. Yes, indeed I do want a test that can output how often each of the incoming connections "wins". Correct, we need to send the subscriptions down all the connections. The kinds of assumptions a junior would understand and come up with himself.

I spend a lot of time with the LLM critiquing, rather than editing. "This thing could be abstracted, couldn't it?" and then it looks through the code and says "yeah I could generalize this like so..." and it means instead of spending my attention on finding things in files, I look at overall structure. This also means I don't need my highest level of attention, so I can do this sort of thing when I'm not even really able to concentrate, eg late at night or while I'm out with the kids somewhere.

So yeah, I might also say there's very little learning curve. It's not like I opened a manual or tutorial before using Claude. I just started talking to it in natural language about what it should do, and it's doing what I want. Unlike seemingly everyone else.

bgwalter 5 days ago [ - ]

Pianists' results are well known to be proportional to their talent/effort. In open source hardly anyone is even using LLMs and the ones that do have barely any output, In many cases less output than they had before using LLMs.

The blogging output on the other hand ...

FeepingCreature 4 days ago [ - ]

> In open source hardly anyone is even using LLMs and the ones that do have barely any output, In many cases less output than they had before using LLMs.

That is not what that paper said, lol.

bgwalter 4 days ago [ - ]

Which paper? The quoted part is my own observation.

FeepingCreature 3 days ago [ - ]

Oh I see, I thought you were quoting https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o... "Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity"

Which shows that LLMs, when given to devs who are inexperienced with LLMs but are very experienced with the code they're working on, don't provide a speedup even though it feels like it.

Which is of course a very constrained scenario. IME the LLM speedup is mostly in greenfield projects using APIs and libraries you're not very experienced with.

stillpointlab 4 days ago [ - ]

I agree with you and I have seen this take a few times now in articles on HN, which amounts to the classic: "We've tried nothing and we're all out of ideas" Simpson's joke.

I read these articles and I feel like I am taking crazy pills sometimes. The person, enticed by the hype, makes a transparently half-hearted effort for just long enough to confirm their blatantly obvious bias. They then act like the now have ultimate authority on the subject to proclaim their pre-conceived notions were definitely true beyond any doubt.

Not all problems yield well to LLM coding agents. Not all people will be able or willing to use them effectively.

But I guess "I gave it a try and it is not for me" is a much less interesting article compared to "I gave it a try and I have proved it is as terrible as you fear".