> Programmers resistance to AI assisted programming has lowered considerably. Even if LLMs make mistakes, the ability of LLMs to deliver useful code and hints improved to the point most skeptics started to use LLMs anyway: now the return on the investment is acceptable for many more folks.
I'm not a fan of this phrasing. Use of the terms "resistance" and "skeptics" implies they were wrong. It's important we don't engage in revisionist history that allows people in the future to say "Look at the irrational fear programmers had of AI, which turned out to be wrong!" The change occurred because LLMs are useful for programming in 2025 and the earliest versions weren't for most programmers. It was the technology that changed.
"Skeptics" is also a loaded term; what does it actually mean? I find LLMs incredibly useful for various programming tasks (generating code, searching documentation, and yes with enough setup agents can accomplish some tasks), but I also don't believe they have actual intelligence, nor do I think they will eviscerate programming jobs, the same way that Python and JavaScript didn't eviscerate programming jobs despite lowering the barrier to entry compared to Java or C. Does that make me a skeptic?
It's easy to declare "victory" when you're only talking about the maximalist position on one side ("LLMs are totally useless!") vs the minimalist position on the other side ("LLMs can generate useful code"). The AI maximalist position of "AI is going to become superintelligent and make all human work and intelligence obsolete" has certainly not been proven.
No, that doesn’t make you a skeptic in this context.
The LLM skeptics claim LLM usefulness is an illusion. That the LLMs are a fad, and they produced more problems than they solve. They cite cherry picked announcements showing that LLM usage makes development slower or worse. They opened ChatGPT a couple times a few months ago, asked some questions, and then went “Aha! I knew it was bad!” when they encountered their first bad output instead of trying to work with the LLM to iterate like everyone who gets value out of them.
The skeptics are the people in every AI thread claiming LLMs are a fad that will go away when the VC money runs out, that the only reason anyone uses LLMs is because their boss forces them to, or who blame every bug or security announcement on vibecoding.
Skeptic here: I do think LLMs are a fad for software development. They're an interesting phenomen that people have convinced themselves MUST BE USEFUL in the context of software development, either through ignorance or a sense of desperation. I do not believe LLMs will be used long term for any kind of serious software development use cases, as the maintenance cost of the code they produce will run development teams into bankruptcy.
I also believe the current generations of LLMs (transformers) are technical dead ends on the path to real AGI, and the more time we spend hyping them, the less research/money gets spent on discovering new/better paths beyond transformers.
I wish we could go back to complaining about Kubernetes, focusing on scaling distributed systems, and solving more interesting problems that comparing winnings on a stochastic slot machine. I wish our industry was held to higher standards than jockeying bug-ridden MVP code as quickly as possible.
Here[1] is a recent submission from Simon Willison using GPT-5.2 to port a Python HTML-parsing library to JavaScript in 4.5 hours. The code passes the 9,200 test cases of html5lib-tests used by web browsers. That's a workable, usable, standards-compliant (as much as the test cases are) HTML parser in <5 hours. For <$30. While he went shopping and watched TV. The Python library it was porting from was also mostly vibe-coded[2] against the same test cases, with the LLM referencing a Rust parser.
Almost no human could port 3000 lines of Python to JavaScript and test it in their spare time while watching TV and decorating a Christmas tree. Almost no human you can employ would do a good job of it for $6/hour and have it done 5 hours. How is that "ignorance or a sense of desparation" and "not actually useful"?
[1] https://simonwillison.net/2025/Dec/15/porting-justhtml/
[2] https://simonwillison.net/2025/Dec/14/justhtml/
I think both of those experiments do a good job of demonstrating utility on a certain kind of task.
But this is cherry-picking.
In the grand scheme of the work we all collectively do, very few programming projects entail something even vaguely like generating an Nth HTML parser in a language that already has several wildly popular HTML parsers--or porting that parser into another language that has several wildly popular HTML parsers.
Even fewer tasks come with a library of 9k+ tests to sharpen our solutions against. (Which itself wouldn't exist without experts trodding this ground thoroughly enough to accrue them.)
The experiments are incredibly interesting and illuminating, but I feel like it's verging on gaslighting to frame them as proof of how useful the technology is when it's hard to imagine a more favorable situation.
> "it's hard to imagine a more favorable situation"
Granted, but this reads a bit like a headline from The Onion: "'Hard to imagine a more favourable situation than pressing nails into wood' said local man unimpressed with neighbour's new hammer".
I think it's a strong enough example to disprove "they're an interesting phenomenon that people have convinced themselves MUST BE USEFUL ... either through ignorance or a sense of desperation". Not enough to claim they are always useful in all situations or to all people, but I wasn't trying for that. You (or the person I was replying to) basically have to make the case that Simon Willison is ignorant about LLMs and programming, is desperate about something, or is deluding himself that the port worked when it actually didn't, to keep the original claim. And I don't think you can. He isn't hyping an AI startup, he has no profit motive to delude him. He isn't a non-technical business leader who can't code being baffled by buzzwords. He isn't new to LLMs and wowed by the first thing. He gave a conference talk showing that LLMs cannot draw pelicans on bicycles so he is able to admit their flaws and limitations.
> "But this is cherry-picking."
Is it? I can't use an example where they weren't useful or failed. It makes no sense to try and argue how many successes vs. failures, even if I had any way to know that; any number of people failing at plumbing a bathroom sink don't prove that plumbing is impossible or not useful. One success at plumbing a bathroom sink is enough to demonstrate that it is possible and useful - it doesn't need dozens of examples - even if the task is narrowly scoped and well-trodden. If a Tesla humanoid robot could plumb in a bathroom sink, it might not be good value for money, but it would be a useful task. If it could do it for $30 it might be good value for money as well even if it couldn't do any other tasks at all, right?
> Granted, but this reads a bit like a headline from The Onion: "'Hard to imagine a more favourable situation than pressing nails into wood' said local man unimpressed with neighbour's new hammer".
Chuffed you picked this example to ~sneer about.
There's a near-infinite list of problems one can solve with a hammer, but there are vanishingly few things one can build with just a hammer.
> You (or the person I was replying to) basically have to make the case that Simon Willison is ignorant about LLMs and programming, is desperate about something, or is deluding himself that the port worked when it actually didn't, to keep the original claim.
I don't have to do any such thing.
I said the experiments were both interesting and illuminating and I meant it. But that doesn't mean they will generalize to less-favorable problems. (Simon's doing great work to help stake out what does and doesn't work for him. I have seen every single one of the posts you're alluding to as they were posted, and I hesitated to reply here because I was leery someone would try to frame it as an attack on him or his work.)
> Is it? I can't use an example where they weren't useful or failed.
> any number of people failing at plumbing a bathroom sink don't prove that plumbing is impossible or not useful. One success at plumbing a bathroom sink is enough to demonstrate that it is possible and useful - it doesn't need dozens of examples - even if the task is narrowly scoped and well-trodden.This smells like sleight of hand.
I'm happy to grant this (with a caveat^) if your point is that this success proves LLMs can build an HTML parser in a language with several popular source-available examples and thousands of tests (and probably many near-identical copies of the underlying HTML specs as they evolve) with months of human guidance^ and (with much less guidance) rapidly translate that parser into another language with many popular source-available answers and the same test suite. Yes--sure--one example of each is proof they can do both tasks.
But I take your GP to be suggesting something more like: this success at plumbing a sink inside the framework an existing house with plumbing provides is proof that these things can (or will) build average fully-plumbed houses.
^Simon, who you noted is not ignorant about LLMs and programming, was clear that the initial task of getting an LLM to write the first codebase that passed this test suite took Emil months of work.
> If a Tesla humanoid robot could plumb in a bathroom sink, it might not be good value for money, but it would be a useful task. If it could do it for $30 it might be good value for money as well even if it couldn't do any other tasks at all, right?
The only part of this that appears to have been done for about $30 was the translation of the existing codebase. I wouldn't argue that accomplishing this task for $30 isn't impressive.
But, again, this smells like sleight of hand.
We have probably plumbed billions of sinks (and hopefully have billions or even trillions more to go), so any automation that can do one for $30 has clear value.
A world with a billion well-tested HTML parsers in need of translation is likely one kind of hell or another. Proof an LLM-based workflow can translate a well-tested HTML parser for $30 is interesting and illuminating (I'm particularly interested in whether it'll upend how hard some of us have to fight to justify the time and effort that goes into high-quality test suites), but translating them obviously isn't going to pay the bills by itself.
(If the success doesn't generalize to less favorable situations that do pay the bills, this clearly valuable capability may be repriced to better reflect how much labor and risk it saves relative to a human rewrite.)
> "Yes--sure--one example of each is proof they can do both tasks."
Therefore LLMs are useful. Q.E.D. The claim "people who say LLMs are useful are deluded" is refuted. Readers can stop here, there is no disagreement to argue about.
> "But I take your GP to be suggesting something more like: this success at plumbing a sink inside the framework an existing house with plumbing provides is proof that these things can (or will) build average fully-plumbed houses."
Not exactly; it's common to see people dismiss internet claims of LLMs being useful. Here[1] is a specific dismissal that I am thinking of where various people are claiming that LLMs are useful and the HN commenter investigated and says the LLMs are useless, the people are incompetent, and others are hand-writing a lot of the code. No data is provided for use the readers to make any judgement one way or the other. Emil taking months to create the Python version could be dismissed this way as well, assuming a lot of hand-writing of code in that time. Small scripts can be dismissed with "I could have written that quickly" or "it's basically regurgitating from StackOverflow".
Simon Willison's experiment is a more concrete example. The task is clearly specified, not vague architecture design. The task has a clear success condition (the tests). It's clear how big the task is and it's not a tiny trivial toy. It's clear how long the whole project took and how long GPT ran for, there isn't a lot of human work hiding in it. It ran for multiple hours generating a non-trivial amount of work/code which is not likely to be a literal example regurgitated from its training data. The author is known (Django, Datasette) to be a competent programmer. The LLM code can be clearly separated from any human involvement.
Where my GP was going is that the experiment is not just another vague anecdote, it's specific enough that there's no room left for dismissing it how the commenter in [1] does. It's untenable to hold the view that "LLMs are useless" in light of this example.
> (repeat) "But I take your GP to be suggesting something more like: this success at plumbing a sink inside the framework an existing house with plumbing provides is proof that these things can (or will) build average fully-plumbed houses."
The example is not proof that these things can do anything else, but why would you assume they can't do tasks of similar complexity? Through time we've gone from "LLMs don't exist" to "LLMs exist as novelties and toys (GPT-1 2018)" to "LLMs might be useful but might not be". If things keep progressing we will get to "LLMs are useful". I am taking the position that we are past that point, and I am arguing that position. We are definitely into the time "they are useful". Other people have believed that for a long time. Not just useful for that task, but for tasks of that kind of complexity.
Sometime between GPT-1 babbling (2018) and today (Q4 2025) the GPTs and the tooling improved from not being able to do this task to yes being able to do this task. Some refinement, some chain of thought, some enlarged context, some API features, some CLI tools.
Since one can't argue that LLMs are useless by giving a single example of a failure, to hold the view that LLMs are useless, one would need to broadly dismiss whole classes of examples by the techniques in [1]. This specific example can't be dismissed in those ways.
> "If the success doesn't generalize to less favorable situations that do pay the bills"
Most bill-paying code in the world is CRUD, web front end, business logic, not intricate parsing and computer science fundamentals. I'm expecting that "AI slop" is going to be good enough for managers no matter how objectionable programmers find it. If I order something online and it arrives, I don't care if the order form was Ruby on Rails emailing someone who copied the order docs into a Google Spreadsheet using an AI generated If This Then That workflow. and as long as the error rate and credit card chargeback rate are low enough, nor will the company owners. Even though there are tons of examples of companies having very poor systems and still being in business, I don't have any specific examples so I wouldn't argue this vehemently - but the world isn't waiting for LLMs to be as 'useful' as HN commenters are waiting for, before throwing spaghetti at the wall and letting 'Darwinian Natural Selection' find the maximum level of slop the markets will tolerate.
----
On that note, a pedantic bit about cherry-picking: there's a difference between cherry-picking as a thing, and cherry-picking as a logical fallacy / bad-faith argument. e.g. if someone claims "Plants are inedible" and I point to cabbage and say it proves the claim is false, you say I'm cherry-picking cabbage and ignoring poisonous foxgloves. However, foxgloves existing - and a thousand other inedible plants existing - does not make edible cabbage stop existing. Seeing the ignored examples does not change the conclusion "plants are inedible" is false, so ignoring those things was not bad. Similarly "I asked GPT5 to port the Linux kernel to Rust and it failed" does not invalidate the html5 parser port.
Definition 2 is bad form; e.g. saying "smoking is good for you, here is a study which proves it" is a cherry-picking fallacy because if the ignored-studies were seen, they would counter the claim "smoking is good for you". Hiding them is part of the argument, deceptively.
"LLMs are useless and only a deluded person would say otherwise" is an example of the former; it's countered by a single example of a non-deluded person showing an LLM doing something useful. It isn't a cherry-picking fallacy to pick one example because no amount of "I asked ChatGPT to port Linux to Rust and it failed" makes the HTML parser stop existing and doesn't change the conclusion.
[1] https://news.ycombinator.com/item?id=45560885
Another skeptic here: I strongly believe that creating new software was always easy. The real struggle is maintaining it, especially for more than one or two years. To this day, I've not seen any arguments or even a hint on reflection on how we're going to maintain all these code that the LLMs is going to generate.
Even for prototyping, using a wireframe software would be faster.
b) why wouldn't a future-LLM be able to maintain it? (i.e. you ask it to make a change to the program's behaviour, and it does).
a) why maintain instead of making it all disposable? This could be like a dishwasher asking who is going to wash all the mass-manufactured paper cups. Use future-LLM to write something new which does the new thing.
The author loves TCL. On prototyping, TCL/Tk it's a godsend.
In this year of 2025, in December, I find it untenable for anyone to hold this position unless they have not yet given LLMs a good enough try. They're undeniably useful in software development, particularly on tasks that are amenable to structured software development methodologies. I've fixed countless bugs in a tiny fraction of the time, entirely accelerated by the use of LLM agents. I get the most reliable results simply making LLMs follow the "red test, green test" approach, where the LLM first creates a reproducer from a natural language explanation of the problem, and then cooks up a fix. This works extremely well and reliably in producing high quality results.
You're on the internet, you can make whatever claims you want. But even with no sources or experimental data, you can always add some rational logic to add weight to your claims.
> They're undeniably useful in software development
> I've fixed countless bugs in a tiny fraction of the time
> I get the most reliable results
> This works extremely well and reliably in producing high quality results.
If there's one common thing in comments that seems to be astroturfing for LLM usage, it's that they use lots of superlative adjectives in just one paragraphs.
You can chose to see it as astroturfing, or see it as people actually thinking the superlatives are appropriate.
To be honest, it makes no difference in my life if you believe or not what I'm saying. And from my perspective, it's just a bit astounding to read people's takes that are authoritatively claiming that LLMs are not useful for software development. It's like telling me over the phone that restaurant X doesn't have a pasta dish, while I'm sitting at restaurant X eating a pasta dish. It's just weird, but I understand that maybe you haven't gone to the resto in a while, or didn't see the menu item, or maybe you just have something against this restaurant for some weird reason.
X has a pasta dish is an easily verifiable factual claim. the pasta dish at X tastes good and is worth the money is a subjective claim, unverifiable without agreeing on a metric for taste and taking measurements. they are two very different kinds of disagreements
'It's $CURRENTYEAR' is just a cheap FOMO tactic. We've been hearing these anectodes for multiple current years now. Where is this less buggy software? Does it just happen to never reach users?
Just two more LLM models and two more prompt optimizations.
"high quality results". Yeah, sure. Then I wanted to check this high quality stuff by myself, it feels way worse than the overall experience in 2020. Or even 2024.
Go to docs, fast page load. Than blank, wait a full second, page loads again. This does not feel like high quality. You think it does because LLM go brrrrrrrr, never complains, says your smart. The resulting product is frustrating.
Yikes.
> They're an interesting phenomen that people have convinced themselves MUST BE USEFUL in the context of software development,
Reading these comments during this period of history is interesting because a lot of us actually have found ways to make them useful, acknowledging that they’re not perfect.
It’s surreal to read claims from people who insist we’re just deluding ourselves, despite seeing the results
Yeah they’re not perfect and they’re not AGI writing the code for us. In my opinion they’re most useful in the hands of experienced developers, not juniors or PMs vibecoding. But claiming we’re all just delusional about their utility is strange to see.
It's absolutely possible to be mistaken about this. The placebo effect is very strong. I'm sure there are countless things in my own workflow that feel like a huge boon to me while being a wash at best in reality. The classic keyboard vs. mouse study comes to mind: https://news.ycombinator.com/item?id=2657135
This is why it's so important to have data. So far I have not seen any evidence of a 'Cambrian explosion' or 'industrial revolution' in software.
> So far I have not seen any evidence of a 'Cambrian explosion' or 'industrial revolution' in software.
The claim was that they’re useful at all, not that it’s a Cambrian explosion.
>This is why it's so important to have data.
"In God we trust, all others must bring data."
> It’s surreal to read claims from people who insist we’re just deluding ourselves, despite seeing the results
just imagine how the skeptics feel :p
Thanks for articulating this position. I disagree with it, but it is similar to the position I held in late 2024. But as antirez says in TFA, things changed in 2025, and so I changed my mind ("the facts change, I change my opinions"...). LLMs and coding agents got very good about 6 months ago and myself and a lot of other seasoned engineers I respect finally starting using them seriously.
For what it's worth:
* I agree with you that LLMs probably aren't a path to AGI.
* I would add that I think we're in a big investment bubble that is going to pop, which will create a huge mess and perhaps a recession.
* I am very concerned about the effects of LLMs in wider society.
* I'm sad about the reduced prospects for talented new CS grads and other entry-level engineers in this world, although sometimes AI is just used as an excuse to paper over macroeconomic reasons for not hiring, like the end of ZIRP.
* I even agree with you that LLMs will lead to some maintenance nightmares in the industry. They amplify engineers' ability to produce code, and there a lot of bad engineers out there, as we all know: plenty of cowboys/cowgirls who will ship as much slop as they can get away with. They shipped unmaintainable mess before, they will ship three times as much now. I think we need to be very careful.
But, if you are an experienced engineer who is willing to be disciplined and careful with your AI tools, they can absolutely be a benefit to your workflow. It's not easy: you have to move up and down a ladder of how much you rely on the tool, from true vide coding for throwaway use-once helper scripts for some dev or admin task with a verifiable answer, all the way up to hand-crafting critical business logic and only using the agent to review it and to try and break your implementation.
You may still be right that they will create a lot of problems for the industry. I think the ideal situation for using AI coding agents is at a small startup where all the devs are top-notch, have many years of experience, care about their craft, and hold each other to a high standard. Very very few workplaces are that. But some are, and they will reap big benefits. Other places may indeed drown in slop, if they have a critical mass of bad engineers hammering on the AI button and no guard-rails to stop them.
This topic arouses strong reactions: in another thread, someone accused me of "magical thinking" and "AI-induced psychosis" for claiming precisely what TFA says in the first paragraph: that LLMs in 2025 aren't the stochastic parrots of 2023. And I thought I held a pretty middle of the road position on all this: I detest AI hype and I try to acknowledge the downsides as well as the benefits. I think we all need to move past the hype and the dug-in AI hate and take these tools seriously, so we can identify the serious questions amidst the noise.
> Skeptic here: I do think LLMs are a fad for software development.
I think that’s where they’re most useful, for multiple reasons:
- programming is very formal. Either the thing compiles, or it doesn’t. It’s straightforward to provide some “reinforcement” learning based on that.
- there’s a shit load of readily available training data
- there’s a big economic incentive; software developers are expensive
> No, that doesn’t make you a skeptic in this context.
That's good to hear, but I have been called an AI skeptic a lot on hn, so not everyone agrees with you!
I agree though, there's a certain class of "AI denialism" which pretends that LLMs don't do anything useful, which in almost-2026 is pretty hard to argue.
On the other hand, ever since LLMs came on the scene, there’s been a vocal group claiming that AI will become intelligent and rapidly bring about human extinction - think the r/singularity crowd. This seems just as untenable a position to hold at this point. It’s becoming clear that these things are simply tools. Useful in many cases, but that’s it.
The AI doomers have actually been around long before LLMs. Discussion about AI doom has been popular in the rationalist communities for a very long time. Look up “Roko’s Basilisk” for a history of one of these concepts from 15 years ago that has been pervasive since then.
It has been entertaining to see how Yudkowsky and the rationalist community spent over a decade building around these AI doom arguments, then they squandered their moment in the spotlight by making crazy demands about halting all AI development and bombing data centers.
> This seems just as untenable a position to hold at this point
To say that any prediction about the future shape of a technology is 'untenable' is pretty silly. Unless you've popped back in a time machine to post this.
Lots of money to be made and power to be grabed on this safety and alignment moat.
> That's good to hear, but I have been called an AI skeptic a lot on hn, so not everyone agrees with you!
The context was the article quoted, not HN comments.
I’ve been called all sorts of things on HN and been accused of everything from being a bot to a corporate shill here. You can find people applying labels and throwing around accusations in every thread here. It doesn’t mean much after a while.
You can acknowledge both the fad phenomenon and the usefulness of LLMs at the same time, because both are true.
There's value there, but there's also a lot of hype that will pass, just like the AGI nonsense that companies were promising their current/next model will reach.
> They cite cherry picked announcements showing that LLM usage makes development slower or worse. They opened ChatGPT a couple times a few months ago, asked some questions, and then went “Aha! I knew it was bad!” when they encountered their first bad output instead of trying to work with the LLM to iterate like everyone who gets value out of them.
"Ah-hah you stopped when this tool blew your whole leg off. If you'd stuck with it like the rest of us you could learn to only take off a few toes every now and again, but I'm confident that in time it will hardly ever do that."
> "Ah-hah you stopped when this tool blew your whole leg off.
Yes, because everyone who uses LLMs simply writes a prompt and then lets it write all of the code for them without thinkng! Vibecoding amirite!?
To be fair, that does seem be a very common usage pattern for them, to the point where they're even becoming a nuisance to open source projects; e.g. https://news.ycombinator.com/item?id=45330378
Not just their usefulness, but LLMs themselves are worse than an illusion, they are illusions that people often believe in unquestioningly - perhaps are being forced to believe in unquestionably (because of mandates, or short term time pressures as kind of race to the bottom).
When the ROI in training the next model is realised to be zero or even negative, then yes the money will run out. Existing models will soldier on for a while as (bankrupt) operators attempt to squeeze out the last few cents/pennies, but they will become more and more out of date, and so the 'age of LLMs' will draw to a close.
I confess my skeptic-addled brain initially (in hope?) misread the title of the post as 'Reflections on the end of LLMs in 2025'. Maybe we'll get that for 2026!
I think LLM skeptics come in a variety of styles. I am skeptical of the current amount of capital flowing into AI vs expected returns on that capital. I use AI regularly. Often for free. I'm not clear what the path to profitability looks like a d justification of valuations. That's why it has dotcom vibes, not because I don't believe in the technology. I don't believe in the snakes pedalling the valuations.
You're not a skeptic but you're not fully a supporter either. You live in this grey zone of contradictions.
First you find them useful but not intelligent. That is a bit of a contradiction. Basically anyone who has used AI, seriously knows that while it can be used to regurgitate generic filler and bootstrap code it can also be used to solve complex domain specific problems that is not at all part of its training data. This by definition makes it intelligent and it makes it so we know the LLM understands the problem it was given. it would be This by definition makes it intelligent, and it makes it so we know the LLM understands the problem it was given. It would be disingenuous for me not to mention how wrong and how much an LLM hallucinates, so obviously the thing has flaws and is not super intelligence. But you have to judge the entire spectrum of what it does. It gets things right and it gets things wrong and getting something complex right makes it intelligent while getting something wrong does not predude it from intelligence.
Second most non skeptics aren't saying all human work is going to be obsolete. no one can predict the future. But you've got to be blind if you don't see the trendline of progress. Literally look at the progress of AI for the past 15 years. You have to be next level delusional if you can't project another 15 years and see that obviously a super intelligence or at least an intelligence comparable to humans is not a reasonable prediction. Most skeptics like you ignore the trendline and cling to what Yann lecunn said about llms being stochastic parrots. It is very likely something with human intelligence exists in the future and in our lifetimes, whether or not its an LLM remains to be seen but we can't ignore where the trendlines are pointing.
Its also significantly lowered because management is forcing AI on everyone at gunpoint, and saying that you'll lose your job if you don't love AI
That's a very easy way to get everyone to pinky promise that they absolutely love AI to the ends of the earth
> The change occurred because LLMs are useful for programming in 2025
But the skeptics and anti-AI commenters are almost as active as ever, even as we enter 2026.
The debate about the usefulness of LLMs has grown into almost another culture war topic. I still see a constant stream of anti-AI comments on HN and every other social platform from people who believe the tools are useless, the output is always unusable, people who mock any idea that operator skill has an impact on LLM output, or even claims that LLMs are a fad that will go away.
I’m a light LLM user ($20/month plan type of usage) but even when I try to share comments about how I use LLMs or tips I’ve discovered, I get responses full of vitriol and accusations of being a shill.
It absolutely is culture war. I can easily imagine a less critical version of myself having ended up in that camp. It comes across to me that the perspective is informed by core values and principles surrounding what "intelligence" is.
I butted heads with many earlier on, and they did nothing to challenge that frame meaningfully. What did change is my perception of the set of tasks that don't require "intelligence". And the intuition pump for that is pretty easy to start — I didn't suppose that Deep Blue heralded a dawn of true "AI", either, but chess (and now Go) programs have only gotten even more embarrassingly stronger. Even if researchers and puzzle enthusiasts might still find positions that are easier for a human to grok than a computer.
> from people who believe the tools are useless, the output is always unusable, people who mock any idea that operator skill has an impact on LLM output
You are attacking a strawman. Almost nobody claims that LLMs are useless or you can never use their output.
Those claims are all throughout this thread and in replies to my comments.
It’s not a strawman. It’s everywhere on HN.
Such as? Currently, the top comments are
> LLMs have certainly become extremely useful for Software Engineers
> LLMs are useful for programming in 2025
> Do LLMs make bad code: yes all the time (at the moment zero clue about good architecture). Are they still useful: yes, extremely so.
If your comment is not a strawman, show me where people actually claim what you say they do.
Its simple. Given the trajectory of these things people feel under threat and defend themselves accordingly. They say what they hope for given a number of factors (bad workplaces generating slop they have to deal with, job losses, identity redefinition, etc). You know the things that happen when a profession is disrupted in a capitalist system where 'what you do' is often tied up with identity, status, and livelihood.
People will go from skeptic to dread/anxiety, to either acceptance or despair. We are witnessing the disruption of a profession in real time and it will create a number of negative effects.
"Useful for programming" is a massive and dishonest bait and switch.
Lots of things are "useful for programming". Switching to a comfier chair is more useful for programming than any LLM.
We were sold vibe coding, and that's what managers want.
There is some limited truth in this but we still see claims that LLMs are "just next token predictors" and "just regurgitate code they read online". These are just uninformed and wrong views. It's fair to say that these people were (are!) wrong.
> we still see claims that LLMs are "just next token predictors" and "just regurgitate code they read online". These are just uninformed and wrong views. It's fair to say that these people were (are!) wrong.
I don't think it's fair to say that at all. How are LLMs not statistical models that predict tokens? It's a big oversimplification but it doesn't seem wrong, the same way that "computers are electricity running through circuits" isn't a wrong statement. And in both cases, those statements are orthogonal to how useful they are.
It is just a tell that the person believes LLMs are more than what they are ontologically.
No one says "computers are JUST electricity running through circuits" because no one tries to argue the computer itself is "thinking" or has some kind of being. No one tries to argue that when you put the computer to sleep it is actually doing a form of "sleeping".
The mighty token though produces all kinds of confused nonsense.
> How are LLMs not statistical models that predict tokens?
there's LLMs as in "the blob of coefficients and graph operations that runs on a gpu whenever there's an inference" which is absolutely "a statistical model that predict tokens" and LLMs as in "the online apps that iterates and have access to an entire automated linux environment that can run $LANGUAGE scripts and do web queries when an intermediary statistical output contains too much maybes and use the result to drive further inference.".
> I don't think it's fair to say that at all. How are LLMs not statistical models that predict tokens? It's a big oversimplification but it doesn't seem wrong
Modern LLMs are trained via reinforcement learning where the training objective is no longer maximum next token probability.
They still produce tokens sequentially (ignoring diffusion models for now) but since the objective is so different thinking of them as next token predictors is more wrong than right.
Instead one has to think of them as trying to fit their entire output to the model learnt in the reinforcement phase. That's how reasoning in LLMs works so well.
It's wrong because it’s deliberately used to mischaracterize the current abilities of AI. Technically it's not wrong but the context of usage in basically every case is that the person saying it is deliberately trying to use the concept to downplay AI as just a pattern matching machine.
I'm a bit confused. You say it's wrong, but then later say it's not wrong, and just because it can be used to downplay advancements in AI doesn't mean that it's wrong and saying it's wrong because it can be used that way is a bit disingenuous.
[flagged]
You don't understand the meaning of "technically". Also, don't use inflammatory language.
P.S. The response is filled with bad faith accusations.
Look at your response. You first dismissed me completely by saying I don’t know what technically means. Then you mischaracterization my statement as an intent to inflame. These are highly insulting and dismissive statements.
You’re not willing to have good faith discussion. You took the worst possible interpretation of my statement and crafted a terse response to shut me down. I only did two things. First I explained myself… then I called you out for what you did while remaining civil. I don’t skirt around HN rules as a means to an end, which is what I believe you’re doing? I’m ok with what you’re doing… but I will call it out.
No surprise that the dishonesty and playing the victim is persistent. It's a fact that this person misuses the term "technically", and that they used inflammatory language. Saying so does not dismiss them completely ... but even if it did, so what? Doing so is not bad faith. No one has any obligation to engage with someone. I won't comment further.
I am not using inflammatory language to hurt anyone. I am illustrating a point on the contrast between technical meaning and non-technical meanings. One meaning is offensive the other meaning is technically correct. Don't start a witch hunt by deliberately misinterpreting what I'm saying.
So technical means something like this: in a technical sense you are a stochastic parrot. You are also technically an object. But in everyday language we don't call people stochastic parrots or objects because language is nuanced and the technical meaning is rarely used at face value and other meanings are used in place of the technical one.
So when people use a term in conversation and go by the technical meaning it's usually either very strange or done deliberately to deceive. Sort of like how you claim you don't know what "technically" means and sort of how you deliberately misinterpreted my words as "inflammatory" when I did nothing of the sort.
I hope you learned something basic about the English today! Good day to you sir!
> a technical sense you are a stochastic parrot.
I am not. I'm sorry you feel this way about yourself. you are more than a next token predictor
If I am more than a next token predictor… doesn’t that mean I’m a next token predictor + more? Do you not predict the next word you’re going to say? Of course you do, you do that and more.
Humans ARE next token predictors technically and we are also more than that. That is why calling someone a next token predictor is a mischaracterization. I think we are in agreement you just didn’t fully understand my point.
But the claim for LLMs are next token predictors is the SAME mischaracterization. LLMs are clearly more than next token predictors. Don’t get me wrong LLMs aren’t human… but they are clearly more than just a next token predictor.
The whole point of my post is to point out how the term stochastic parrot is weaponized to dismiss LLMs and mischaracterize and hide the current abilities of AI. The parent OP was using the technical definition as an excuse to use the word as a means to achieve his own ends namely be “against” AI. It’s a pathetic excuse I think it’s clear the LLM has moved beyond a stochastic parrot and there’s just a few stragglers left who can’t see that AI is more than that.
You can be “against” AI, that’s fine but don’t mischaracterize it… argue and make your points honestly and in good faith. Using the term stochastic parrot and even what the other poster did in attempt to accuse me of inflammatory behavior is just tactics and manipulation.
Objecting to these claims is missing their point. Saying these things is really about denying that the LLMs "think" in any meaningful sense. (And the retorts I've seen in those discussions often imply very depressing and self-deprecating views of what it actually means to be human.)
Leave it to HN to be militantly misanthropic to sell chatbots.
One only has to go read the original vibe coding thread[0] from ...ten months ago(!) to see the resistance and skepticism loud and clear. The very first comment couldn't be more loud about it.
It was possible to create things in gpt-3.5. The difference now is it aligns with the -taste- of discerning programmers, which has a little, but not everything, to do with technological capability.
[0]https://news.ycombinator.com/item?id=42913909
"Look Ma, no hands!" vibe coding, as described by Karpathy, where you never look at the code being generated, was never a good idea, and still isn't. Some people are now misusing "vibe coding" to describe any use of LLMs for coding, but there is a world of difference between using LLMs in an intelligent considered way as part of the software development process, and taking a hit on the bong and "vibe coding" another "how many calories in this plate of food" app.
Karpathy himself has used "vibe coding" to describe "usage of LLMs for coding," so it's fair to say the definition has expanded.
https://karpathy.bearblog.dev/year-in-review-2025/
Which frankly makes it pretty useless. Describing how I use them at work as "vibe coding" in the same vein as a random redditor generating whatever on Replit is useless. It's a definition so wide as to have no explanatory power.
> The difference now is it aligns with the -taste- of discerning programmers
This... doesn't match the field reports I've seen here, nor what I've seen from poking around the repos for AI-powered Show HN submissions.
On the tabs vs spaces battleground there are no winners; we just need to lower our expectations :)
you just need to hop into any AI reltaed thread (even this one) and it's pretty clear no one is revising anything, skepticism is there lol.
Yes, it's a strange take. It's not that programmers have changed their mind about unchanging LLMs, but rather that LLMs have changed and are now useful for coding, not just CoPilot autocomplete like the early ones.
What changed was the use of RLVR training for programming, resulting in "reasoning" models that are now attempting to optimize for a long-horizon goal (i.e. bias generation towards "reasoning steps" that during training let to a verified reward), as opposed to earlier LLMs where RL was limited to RLHF.
So, yeah, the programmers who characterized early pre-RLVR coding models as of limited use were correct. Now the models are trained differently and developers find them much more useful.
Agree with this. The RLVR changes (starting with o1 I think) was what changed/disrupted the industry. Before that I thought these things were just better autocomplete.
I thought I'd read a lot of these threads this year, and also discussed off-site the use of coding agents and the technology behind them; but this is genuinely the first time I've seen the term "RLVR".
RLVR "reinforcement learning for verifiable rewards" refers to RL used to encourage reasoning towards achieving long-horizon goals in areas such as math and programming, where the correctness/desirability of a generated response (or perhaps an individual reasoning step) can be verified in some way. For example generated code can be verified by compiling and running it, or math results verified by comparing to known correct results.
The difficulty of using RL more generally to promote reasoning is that in the general case it's hard to define correctness and therefore quantify a reward for the RL training to use.
> The difficulty of using RL more generally to promote reasoning is that in the general case it's hard to define correctness and therefore quantify a reward for the RL training to use.
Ah, hence the "HF" angle.
RLHF really has a different goal - it's not about rewarding/encouraging reasoning, but rather rewarding outputs that match human preferences for whatever reason (responses that are more on-point, or politer, or longer form, etc, etc).
The way RLHF works is that a smallish amount of feedback data of A/B preferences from actual humans is used to train a preference model, and this preference model is then used to generate RL rewards for the actual RLHF training.
RLHF has been around for a while and is what tamed base models like GPT 3 into GPT 3.5 that was used for the initial ChatGPT, making it behave in more of an acceptable way!
RLVR is much more recent, the basis of the models that do great at math and programming. If you talk about reasoning models being RL trained then it's normally going to imply RLVR, but it seems there's a recent trend of people calling it RLVR to be more explcit.
> generated code can be verified by compiling and running it
I think this gets to the crux of the issue with LLMs for coding (and indeed 'test orientated development'). For anything beyond a most basic level of complexity (i.e. anything actually useful), code cannot be verified by compiling and running it. It can only be verified - to a point - by skilled human inspection/comprehension. That is the essence of code really, a definition of action, given by humans, to a machine for running with /a prior/ unenumerated inputs. Otherwise it is just a fancy lookup table. By definition then not all inputs and expected outputs can be tabulated, tested for, or rewarded for.
I was talking about the RL training process for giving these models coding ability in the first place.
As far as using the trained model to generate code, then of course it's up to the developer to do code reviews, testing, etc as normal, although of course an LLM can be used to assist writing test cases etc as well.