I presume you're referring to the recent METR study. One aspect of the study population, which seems like an important causal factor in the results, is that they were working in large, mature codebases with specific standards for code style, which libraries to use, etc. LLMs are much better at producing "generic" results than matching a very specific and idiosyncratic set of requirements. The study involved the latter (specific) situation; helping people learn mainstream material seems more like the former (generic) situation.
(Qualifications: I was a reviewer on the METR study.)