> There are plenty of papers out there that look at LLM productivity and every one of them seems to have glaring methodology limitations and/or reports on models that are 12+ months out of date.
This is a general problem with papers measuring productivity in any sense. It's often a hard thing to define what "productivity" means and to figure out how to measure it. But also in that any study with worthwhile results will:
1. Probably take some time (perhaps months or longer) to design, get funded, and get through an IRB.
2. Take months to conduct. You generally need to get enough people to say anything, and you may want to survey them over a few weeks or months.
3. Take months to analyze, write up, and get through peer review. That's kind of a best case; peer review can take years.
So I would view the studies as necessarily time-boxed snapshots due to the practical constraints of doing the work. And if LLM tools change every year, like they have, good studies will always lag and may always feel out of date.
It's totally valid to not find a lot of value in them. On the other hand, people all-in on AI have been touting dramatic productivity gains since ChatGPT first arrived. So it's reasonable to have some historical measurements to go with the historical hype.
At the very least, it gives our future agentic overlords something to talk about on their future AI-only social media.