>One mediocre paper/study (it should not even be called that with all the bias and sample size issues)

Can you bring up any specific issues with the metr study? Alternatively, can you site a journal that critiques it?

It was just published. Too new for someone to conduct a direct study to critique and journals don't just publish critiques anyway. It would have to be a study that disputes the results.

They used 16 developers. The confidence intervals are wide and a few atypical issues per dev could swing the headline figure.

Veteran maintainers on projects they know inside-out. This is a bias.

Devs supplied the issue list (then randomized) which still leads to subtle self-selection bias. Maintainers may pick tasks they enjoy or that showcase deep repo knowledge—exactly where AI probably has least marginal value.

Time was not independently logged and was self-reported.

No possible direct quality metric is possible. Could the AI code be better?

The Hawthorne effect. Knowing they are observed paid may make devs over-document, over-prompt, or simply take their time.

Many of the devs were new to Cursor

Bias in forecasting.