Or I cannot trust a contrived laboratory setting with it's garden of forking paths.

https://mleverything.substack.com/p/garden-of-forking-paths-...

I did not say to trust it. I do not need to trust it.

If I run my own tests on my own codebase I will definitely use some objective time measurement method and a subjective one. I really want to know if there is a big difference.

I really wonder if its just the individuals bias showing. If you are pro-AI you might overestimate one, and if you are against it you might under-estimate it.

That's fair, I agree.