>I'm using Claude every day, and it definitely makes me faster but..

I see a lot of posts about this, and I see a lot studies, also on HN, that show that this isn't the case.

Now of the course the "this isn't the case" stuff is statistically, thus there can be individual developers whom are faster, but there can also be that an individual developer sometimes is faster and sometimes not but the times that they are faster are just so clearly faster that it sort of hides the times that they're not. Statistics of performance over a number of developers can flatten things out. But I don't know that is the case.

So my question for you, and everyone that claims it makes them so perceptively and clearly faster - how do you know? Given all the studies showing that it doesn't make you faster, how are you so sure it does?

It's incredibly frustrating arguing these same points, over and over, every time that this comes up. You're asking people who are experienced developers absolutely chewing through checklists and peeking at HN while compiling/procrastinating/eating a sandwich/waiting for a prompt to finish to not just explain but quantify what is plainly obvious to those people, every day. You want us to bring paper receipts, like we have some incentive to lie to you.

From our perspective, the gains are so obvious that it really does feel like you must just be doing something fundamentally wrong not to see the same wins.

So when someone says "I can't make it do the magic that you're seeing" it makes me wonder why you don't have a long list of projects that you've never gotten around to because life gets in the way.

Because... if you don't have that list, to us that translates as painfully incurious. It's inconceivable that you don't have such a list because just being a geek in this moment should be enough that you constantly notice things that you'd like to try. If you don't have that, it's like when someone tells you that they don't have an inner monologue. You don't love them any less, but it's very hard not to look at them a bit differently.

> You want us to bring paper receipts, like we have some incentive to lie to you

Of course you might be heavily financially invested in AI companies so you might indeed have an incentive to hype up the tech

Or you might be relying on your hype to drive traffic to your blog

I don't have an investment portfolio, and I don't have a blog. But I appreciate that you're actually thinking about what I said, so thanks for that.

That said, if I did have a blog, the amount of negative attention that AI gets these days is not going to have the desired effect.

>It's incredibly frustrating arguing these same points, over and over,

quite frankly there seems to be something incredibly frustrating in your life going on, but I'm not sure that the underlying cause of whatever is weighing on your mind at the moment is that I asked "how do you know that what you are feeling is actually true, in comparison to what studies show should be true?" (rephrased, as not reasonable to quote whole post)

>From our perspective, the gains are so obvious that it really does feel like you must just be doing something fundamentally wrong not to see the same wins.

From my perspective, when I think i am experiencing something that data from multiple sources tell me is not what is actually happening I try to figure out how I can prove what I am experiencing, I reflect upon myself, have I somehow deluded myself? No? Then how do I prove it when analysis of many similar situations to my own show a different result?

You seem to think what I mean is people saying "Claude didn't help me, it wasn't worth it", no, just to clarify although I thought it was really clear, I am talking about numerous studies always being posted on HN so I'm sure you must have seen them where productivity gains from coding agents do not seem to actually show up in the work of those who use it. Studies conducted by third parties observing the work, not claims made by people performing the work.

I'm not going to go through the rest of your post, I get the urge to be insulting, especially as a stress release if you have a particularly bad time recently. But frankly, statistically speaking, my life is almost certainly significantly worse than yours, and for that reason, but not that reason alone, I will also quite confidently state without hardly any knowledge of you specifically but just my knowledge of my life and comparison of having met people throughout it, that my list dwarfs yours.

To lay it out, Im pretty firmly pro-AI.

Putting it succinctly, these kind of conversations feel weird because it's like asking whether carpenters are faster using power tools or hand tools. If you've used power tools it's obvious they make work a lot faster. Maybe there were some studies around the time power tools were introduced looking at the productivity of carpenters, if those studies had results saying the productivity gains weren't obvious in the data that means you have a problem with your study and the data you have collected (which is totally understandable, measuring imprecise things like productivity accurately is really hard). You have to look at the evidence in front of you though, try telling the guy with a chainsaw that he's actually no more productive than he was when he was using an axe and he'll laugh at you.

[dead]

This takes the cake for one of the strangest replies I've ever received on here.

I'm not sure how or indeed why you draw lines from what I said to my life situation... which is relevant how?

What I apparently did not do a good enough job of conveying is that those "data from multiple sources" get cited and then people immediately reply with "those are old studies". It's circular in the same way that arguing with anti-vax people is circular.

The difference is that unlike vaccines, it's very easy for someone to see how productive they are when using LLMs properly. It's not a subtle difference.

Hence the frustration with people who keep insisting that we're imagining our own productivity. It's not a good faith inquiry.

OK, glad to hear I was mistaken, but it certainly seemed like about halfway through your first response you went off the rails and decided to take my question as some sort of personal affront. It was not the strangest response I've had on HN, but one of the strangest. I could go through with a full analysis of why I thought "this guy is having problems", but that would take a long time and as you say you aren't I guess it isn't particularly useful.

I guess we aren't going to get anything meaningful between us on this subject, because you seem to think it is like arguing with an anti-vaxxer, which funny enough I thought the same thing,

So fine, you experience a gain, you just do, and it is so clear and evident you don't need to guard yourself against being deluded despite studies suggesting that gain is not there. That seems crazy to me, I would doubt and want to verify my gain if I read a study suggesting the gain was illusory. No meaningful convergence seems possible between needing verification and not needing verification.

I like remus' comment to your previous message; you're telling a guy with a chainsaw who is busy chopping down trees at lightning speed that he should stop and defend his daily experience against some studies that suggest tree chopping speeds are not what they seem.

At some point you just have to shrug and get back to work chopping down 3-5x more trees than you did last year.

Writing software is not chopping trees, though.

For instance, there is a lot of evidence (and intuition, frankly) to the argument that while LLM increase superficial, short-term productivity, they also cause an extreme accumulation in technical debt that may more than wipe out any initial, fast progress down the line.

If you aren't reviewing the changes its proposing, you deserve what's coming to you.

> It's incredibly frustrating arguing these same points, over and over, every time that this comes up. You're asking people who are experienced developers absolutely chewing through checklists and peeking at HN while compiling/procrastinating/eating a sandwich/waiting for a prompt to finish to not just explain but quantify what is plainly obvious to those people, every day. You want us to bring paper receipts, like we have some incentive to lie to you.

This puts what I have been feeling in the recent months into words pretty concisely!

To me, it really is a force multiplier: https://news.ycombinator.com/item?id=47271883

Of course, I still have to pay attention to what AI is doing, and figure out ways how to automate more code checks, but the gradual trend in my own life is more AI, not less: https://blog.kronis.dev/blog/i-blew-through-24-million-token... (though letting it run unconstrained/unsupervised is a mess, I generally like to make Claude Code create a plan and iterate on it with Opus 4.6, then fire off a review, since getting the Max subscription I don't really need Cerebras or other providers, though I still appreciate them)

At the same time I've seen people get really bad results with AI, often on smaller models, or just expecting to give it vague instructions and get good results, with no automated linters or prebuild checks in place, or just copying snippets with no further context in some random chat session.

Who knows, maybe there's a learning curve and a certain mindset that you need to have to get a benefit from the technology, to where like 80% of developers will see marginal gains or even detriment, which will show up in most of the current studies. A bit like how for a while architecturally microservices and serverless were all the rage and most people did an absolutely shit job at implementing them, before (hopefully) enough collective wisdom was gained of HOW to use the technology and when.

Totally! Though I maintain that the only good aspect to microservices is that krazam video. You know the one.

I do get frustrated when I see people not using Plan steps, copy/pasting from web front-ends or expecting to one-shot their entire codebase from a single dense prompt. It's problematic because it's not immediately obvious whether someone is still arguing like it's late 2024, you know what I mean?

Also, speaking for myself I can't recommend that anyone use anything but Opus 4.5 right now. 4.6 has a larger context window, but it's crazy expensive when that context window gets actually used even while most agree that these models get dumber when they have a super-large context. 4.5 actually scores slightly better than 4.6 on agentic development, too! But using less powerful models is literally using tools that are much more likely to produce the sorts of results that skeptics think apply across the board.

Haven't looked into 4.5 vs 4.6 in depth (since the latter seems good for my needs), but

> but it's crazy expensive

was something I struggled with until just going for the Max subscription and cancelling my other ones.

I'm not sure what Anthropic is doing, but they're either making truckloads of money from those paying per-token (especially since you're not supposed to use subscriptions for server use cases --> devs can use Claude Code, but not code review bots etc.), or heavily subsidizing subscriptions.

100 USD is worth it for me, I've only hit the 5 hour limits a few times, and haven't hit 100% of the weekly limits once. I fear to think how much comparable usage with any of the Opus models would have been, if I were to pay per token - even Sonnet could get similarly expensive.

I don't get/like/want Claude Code. I do everything in Cursor, and I am very happy. I recommend it! And there's no time-based limits. You get deeply discounted API calls included in your monthly subscription, and then overage is billed at the same discounted rate. It's essentially committing to an "at least" amount per month in exchange for a preferred rate.

I have a USD$200/month Cursor plan, and I do hundreds of hours worth of Opus 4.5 prompting with it every month. I tend to pay $250-300 a month after overages, and I consider myself a heavy user. During Opus 4.1 days, one month I paid $700. 4.5 got substantially cheaper and smarter, and I consider that the real moment agentic coding got real.

I don't know your financial situation and I recognize that $300/month is more than much of the world makes in a month. I am just saying that for me, what I'm working on is important enough that I am absolutely willing to pay a premium for access to the best tooling available, because every dollar I spend represents literally an hour of my time. Maybe more? It's so incredibly cheap compared to hiring an unreliable human who needs to sleep.

You can't pay someone $3600/year to lick stamps, much less pair program application development.

That's pretty cool! I haven't really been a heavy user of Cursor, but found Cline/RooCode/KiloCode in VSC to be pretty good, while letting me preserve my existing setup and also easily switch between multiple providers, sometimes in the middle of some work, to let another model check the output of the first one!

I think most I ever spent per month was 300 USD, but I had to cut down on that and Anthropic's subscription being way more affordable than paying per token (alongside GitHub Copilot, which also has multiple model support and pretty generous limits alongside unlimited autocomplete), since I'm also helping a friend with expenses during their chemo and some other friends with some meds and stuff, even though policemen and teachers and others have way worse financial circumstances than software devs in Latvia, the economy here doesn't give that much breathing room for that kind of thing.

Oh for a while I was also using Cerebras Code which gives you really generous token limits (like 24M per day on the 50 USD per month tier), though the GLM 4.7 model I tried out still made me go back and work on fixing its output more often than I'd like. Eventually I kinda settled on SOTA.

That said, I do remember a post here on HN where some founders were thinking whether they should throw something like over 1000 USD at Anthropic (the API variety) per month and they realized that for them that amount of money was totally reasonable, compared to getting some junior devs or whatever.

I read that same post, and for me it wasn't just something I remember; it had a profound impact on how I came to be typing at you casually about how I have spent up to $700 a month on Opus tokens in Cursor (which absolutely lets you switch between providers... I just really like Opus 4.5!)

To me, all of the switching between dev environments + all of the time spent undoing errors causes by less powerful models has a huge time cost; not to be cliche that means it's very expensive to use error prone models and obsess over trying all of the new half-baked things (I've never even heard of most of the stuff you mentioned, lol). Like, if I spend an hour of my time mucking around with some tool, that's a good chunk of the $200/month I commit to Cursor.

Anyhow, at the real risk of sounding like an unpaid Cursor salesman, IMO it's worth every penny. For me, the jury is still out on whether people find Opus 4.6's 5x context to be valuable enough to pay significantly more for it over 4.5, which again is rated as being slightly better at agentic coding than 4.6. Since agentic coding is what I do....

I'm a principal engineer, been working on the same set of codebases for almost 10 years. I handle the 20% or so of my time that constitutes inbound faster than ever and I know because that inbound volume has clearly increased and yet I have, for the first time ever, begun chipping away at the "nice to have" backlog. My biggest time sink now is interviewing and code reviews -- the latter being directly proportional to the velocity increase across the teams I work with. Actually that's my biggest concern -- we are approaching a breaking point for code review volume.

Sorry I don't have DX stats or token usage stats I can share, but based on the directives from on high, those stats are highly correlated (in the positive).

[edit] And SEV rates are not meaningfully higher.

thanks, this seems pretty useful information.

Assuming inbound volume clearly increased is something like we've been handling more tickets than ever before over the last few quarters or something like that.

I've read this code review thing before, and this tends to go into these studies suggesting that the whole process is taking the same amount of time, but for that to be the case the code reviews would have to take longer on the individual code review level and for you it is just volume increase because of increased tickets being pushed through.

Is there anything about your ticketing strategy? For example do you make your tickets much more atomic than lots of teams who say they do but then end up with things that could be split up into two or three tickets? How much time do spend on preparing tickets for ready for development / ready for AI?

Just trying to identify behavioral patterns in your successful usage that would explain the success. Given the example of throughput of tickets over long time I suppose we can assume that the gain is not illusory.

> everyone that claims it makes them so perceptively and clearly faster - how do you know?

For me, AI tools act like supercharged code search and auto complete. I have been able to make changes in complex components that I have rarely worked on. It saved me a week of effort to find the exact API calls that will do what I needed. The AI tool wrote the code and I only had to act as a reviewer. Of course I am familiar with the entire project and I knew the shape of the code to expect. But it saved me from digging out the exact details.

> For me, AI tools act like supercharged ... search and auto complete.

I think that is a fairly good definition of what an LLM is. I'd say the third leg of the definition is adjustable randomness.

Fair question! I've wondered that myself, there is always the possibility that the productivity gain is in my head. I'm not AI pilled, if these things disappeared tommorow I would probably just shrug, I'm just trying to keep up to date.

Where I find it makes me faster is in dealing with writing low value code that's repetitive, which I might normally procrastinate. Like, the thing I'm working on is a data editor that generates a lot of fields, so having it churn out a lot of samey react code is useful to me in that context. There's already an obvious pattern for the tools to follow.

I also find it useful for "rubber ducking". Bouncing ideas I might previously have bugged a colleague about.

By faster I'm not suggesting a fanciful number for me. Maybe like 10 to 20 percent if I were to guess.

> I see a lot of posts about this, and I see a lot studies, also on HN, that show that this isn't the case.

Most of these studies were done one or more years ago, and predate the deployment and adoption of RLHF-based systems like Claude. Add to that, the AI of today is likely as bad as it's ever going to be (i.e., it's only going to get better). Though I do think the 10x claims are probably unfounded.

I mean obviously things will always be a little bit behind that one reads about, so this is one of the claims I see sometimes about these studies is they are out of date, and if working with the new models they would find that wasn't the case. but then that is one of the continuing claims one also sees about LLMS, that the newest model fixes whatever issue one is complaining about. And then the claim gets reiterated.

The thing is when I use an AI I sort of feel these gains, but not any greatness, it's like wow it would have taken me days to write all this reasonable albeit sort of mediocre code. I mean that is definitely a productivity gain. Because a lot of times you need to write just mediocre code. But there are parts where I would not have written it like that. So if I go through fixing all these parts, how much of a gain did I actually get?

As most posters on HN I am a conceited jerk, so I can claim that I have worked with lots of mediocre programmers (while ignoring the points where I was mediocre by thinking oh that didn't count I followed the documentation and how it was suggested to use the API and that was a stupid thing to do) and I certainly didn't fix everything that they did, because there just wasn't enough hours in the day.

And they did build stuff that worked, much of the time, so now I got an automated version of that. sweet. But how do I quantify the productivity? Since there are claims put forth with statistical backing that the productivity is illusory.

This is just one of those things that tend to affect me badly, I think X is happening, study shows X does not happen. Am I drinking too much Kool-Aid here or is X really happening!!? How to prove it!!? It is the kind of theoretical, logical problem seemingly designed to drive me out of my gourd.