As one of the naysayers who talked a lot about the original study, I enthusiastically endorse any attempt at all to actually measure AI productivity. An increase from 20% slowdown to 20% speedup over the past year seems broadly consistent with my understanding of how things have gone. I think I remain classified as a "naysayer", though, because the "booster" case has gone from "I'm multiple times more productive" to "I never have to look at code my AI agents just handle everything" over the same period.

I think the issue was with incomplete context. Even before the original METR study came out, there were a number of larger-scale studies that showed a 15 - 30% boost, starting as far back as 2024. I often mention them, though they require some explanation, so this thread and linked comments may be useful: https://news.ycombinator.com/item?id=46559254

However those studies never got as much airtime as the METR study, and this has created an imbalanced perspective.

My take is that studies like this are extremely useful, but a lagging indicator of the true extent of AI-assisted coding. Especially since the latest tools are something else entirely.

I am not at the "never look at code again" stage, the old habits are just too ingrained... but I'm starting to look less frequently because I rarely find anything to fix. I can see a path from where I'm at to the outlandish claims people have been making.

I tried the "don't look too closely" thing for the first time last week. I got immediately humiliated when a reviewer asked why my commit was trying to replace the correct, elegant usage of an API the class was named after with a 4-line long franken-command using a different API with incorrect semantics. It's not like I'm not trying the new stuff, on a subjective level I think AI coding is really neat, but I just can't ever figure out how to map what I get to the stories I hear.

It depends what you're measuring.

Don't get me wrong, my experiments with true-vibe-coding (i.e. don't even look at the code) are as yours, that the result is somewhat mediocre*.

For some cases, and I try to push beyond the limits of what LLMs can do in order to find those limits, they suck. I'd describe the output as like that of an overenthusiastic junior who reinvents the wheel badly rather than using standard approaches even when told to.

For other cases, I know that mediocre code is actually good enough: well before LLMs happened, I've seen mediocre code that still resulted in the app itself being given meaningful public accolades.

* Though, as per previous comment of mine, I can't help notice that the mediocrity is doing more and more of my previous career: https://news.ycombinator.com/item?id=46989102

Oh yeah I can see that happening, which is why I still scan the code! However, one thing I'll add is that AI-assisted coding requires adapting your workflow. Fortunately, it largely boils down to coding best-practices on steroids: docs, tests, tooling like linters, etc.

I throw tests at everything, even minor functions, preferably integration, maybe even some E2E with Playwright in web apps, at least for the happy paths. I actually pay more attention to the tests. The amazing thing is that the AI writes all of these and uses them as feedback to fix its mistakes.

But these validation guardrails are what has been driving down the issues I encounter. Without these the AI can make mistakes, and hence will require more in-depth manual review.

You just have to give up and drink the koolaid...

But for real... My company started tracking commits per hour as a metric so I just commit as many times as I can. I don't get the luxury of even looking at my work now. They say it's faster but I've never seen so much tech debt delivered so quickly in my life.

Its going to be an interesting few years...

Definitely need to stop squashing commits if that is the case! But no, seriously tracking git commit counts is absolutely ridiculous. Maybe you can have AI autonomously work on useless documentation that no one will read, with 1 commit per 100 lines of markdown?