The study was designed to have devs who are comfortable with AI perform 50% of tasks with AI and 50% without. So the problem is the population of "Developers who use AI regularly but are willing to do tasks without AI" is shrinking.
>> Are they worried that by splitting devs into groups of AI experience they might be measuring some confounder that causes people to choose AI / not AI in their careers?
The developer sample size was small (16 people in the original study) and the task sample size is larger (~250 tasks). I think the worry is variance in developer productivity would totally wash out any signal.
An alternative hypothesis might be "Developers who consistently use AI become unable to work without AI". It used to be well known that after a year or two away from writing code, a new manager would be a much worse dev than previously. Is a similar sort of skill shift happening? If we raise a cohort of new devs who never work without AI, do they never gain the ability?