That METR study gets a lot of traction for its headline; and I doubt many people read the whole thing—it was long—but the data showed a 50% speed up for the one dev with the most experience with Cursor/AI, suggesting a learning curve and also wild statistical variation on a small sample set. An errata later suggested another dev who did not have a speedup had not represented their experience correctly, but still strongly draws into question the significance of the findings.

The specific time sucks measured in the study are addressable with improved technology like faster LLMs and improved methodology like running parallel agents—the study was done in March running Claude 3.7 and before Claude Code.

We also should value the perception of having worked 20% less even if you actually spent more time. Time flies when you’re having fun!

It was such a great study imo and rightfully deserves a lot of attention since it's a great alternative to twitter vibes.

But to stop here and say AI coding doesn't work will just not hold up well. We have a sample size of 16 developers for 250 tasks.

Here is my data point: adding AI to your engineering team really is like adding a new member. At first it slows everybody down: explain this, explain that, stupid mistake made that the experienced dev would have avoided. But over the months you learn how to get the most out of them: they speak fluently all programming languages, don't mind writing 20 tests or coming up with good industry ideas to your problems.

Over time I have come to appreciate that the market is rather smart and even hype has its place. You often have to cross a line to know you crossed it, especially with something as fuzzy and quickly changing as technological progress hitting the messy real world. But then again, we need the heretics to stay smart and not follow an emperor without clothes.

Just putting this up as a reference the next time this comes up on HN. The study data shows that the median task is 1.5hrs and the 15 minutes that developers think they saved was actually more than that many minutes less researching, testing, and writing code and the 15 minutes longer they actually spent was more idle (5mins) and waiting for AI (5 mins) and reviewing and prompting dominating their work over actually writing code.

The person who showed a speed-up indicated over a week of prior experience with cursor while all others under a week.