I wonder how much this was a factor:

"The occurrence of inaccurate “hallucinations” by the current [LLMs] poses a significant challenge for their use in education. [...] we enriched our prompts with comprehensive, step-by-step answers, guiding the AI tutor to deliver accurate and high-quality explanations (v) to students. As a result, 83% of students reported that the AI tutor’s explanations were as good as, or better than, those from human instructors in the class."

Not at all dismissing the study, but to replicate these results for yourself, this level of gain over a classroom setting may be tricky to achieve without having someone make class materials for the bot to present to you first

Edit: the authors further say

"Krupp et al. (2023) observed limited reflection among students using ChatGPT without guidance, while Forero (2023) reported a decline in student performance when AI interactions lacked structure and did not encourage critical thinking. These previous approaches did not adhere to the same research-based best practices that informed our approach."

Two other studies failed to get positive results at all. YMMV a lot apparently (like, all bets are off and your learning might go in the negative direction if you don't do everything exactly as in this study)

In case you find it interesting: I deployed an early version of a "lesson administering" bot deployed on a college campus that guides students through tutored activities of content curated by a professor in the "study mode" style -- that is, forcing them to think for themselves. We saw an immediate student performance gain on exams of about 1 stdev in the course. So with the right material and right prompting, things are looking promising.

OpenAI should figure out how to onboard teachers. Teacher uploads context for the year, OpenAI distributes a chatbot to the class that's perma fixed into study mode. Basically like GPT store but with an interface and UX tuned for a classroom.