AI used to write homework should be banned.
AI in 1:1 tutor mode with proper hardware (live scanning pen and paper), harness and guardrails should be wildly successful (in terms of education outcomes) especially in elementary school.
AI used to write homework should be banned.
AI in 1:1 tutor mode with proper hardware (live scanning pen and paper), harness and guardrails should be wildly successful (in terms of education outcomes) especially in elementary school.
Disagree. AI has no business being used in 1:1 tutor mode before the hallucination and sycophancy issues are completely resolved. As is, I can easily see it being a hindrance to building actual understanding.
Just one example - it's very common to see ChatGPT and the like respond with "you're absolutely correct! Great insight" to something that is a complete misunderstanding.
This is specifically a consumer model (or specifically ChatGPT) issue. e.g. IME codex does not do this, and will just tell you when you're missing something or somehow wrong, and Gemini does this weird thing where it tells you you're a genius and then immediately starts correcting everything you said.
Sycophancy is just one aspect of the problems I mentioned, though. Another huge one is hallucination, and one that is actually far worse than I thought:
> It’s been proven that when a model is trained on large volumes of highly factual and non-theoretical data, it learns to always have an answer. DeepSeek V4 Pro (1.6T params, 49B active, 44 AA Intelligence Index score) has a ludicrous 94% hallucination score on the AA-Omniscience benchmark, meaning on questions that it couldn’t figure out, it only stated that it didn’t know around 6% of the time, and the rest it confidently hallucinated an answer. GLM-5.2 scored a 28% hallucination rate, Opus 4.8 was 36%, Fable 5 was 48%, and GPT-5.5 was 86%.
https://arrowtsx.dev/bigger-models/
I think even a 5% hallucination rate would be terrible for a teacher, who should generally be comfortable with saying "I don't know off the top of my head but here is how to find resources to answer your question".
---
So, just to drive the point home, Codex has an 86.9% hallucination rate on the AA-omniscience score in this index https://benchlm.ai/models/gpt-5-3-codex - if you ask it something that wasn't sufficiently covered in its training data, it will confidently make up an answer nearly 87% of the time.
While you might think it is happy to correct you when you are wrong, you don't know that for sure since you don't know when you're wrong. Codex may have been happily agreeing with you about things you had completely backwards.
Except I generally do know when I'm wrong because I'm working in a domain I am familiar with, and it will often create experiments on the fly unprompted (well, prompted, but generically in AGENTS.MD) to check itself. My experience actually using it for software is that it almost never makes up answers. The answer for hallucinations is fairly simple: give it facts and tools to ground itself.
> Except I generally do know when I'm wrong because I'm working in a domain I am familiar with, and … My experience actually using it for software is that it almost never makes up answers.
Yes I am certain that it feels that way. However empirical testing holds a lot more weight than anecdotes.
> The answer for hallucinations is fairly simple: give it facts and tools to ground itself.
The entire danger here is that it hallucinates when you don’t know the ground facts. After all, you don’t know what you don’t know.
> Gemini does this weird thing where it tells you you're a genius and then immediately starts correcting everything you said.
That's a great way to get you to listen because your guard is down. Imagine if it told you you were an idiot and then corrected you.
Just realized 1:1 AI is 90s self-esteem medals-for-everyone parenting on steroids.
Teachers hallucinate too. I’ve had creationists and communists and tin-foil-hat (chem trails, 5g, etc) teachers. Surely you can imagine an AI tutor that is higher than zero ROI.
> I’ve had creationists and communists and tin-foil-hat (chem trails, 5g, etc) teachers.
I certainly have, too, but there is still a difference between a person who has a factually incorrect but consistent worldview and an LLM which simply reflects the worldview of the user or even changes between queries.
I don't think creationists have any business being in schools either, for what it's worth, but I think it's easier for a teenager to sort out "Mr. Smith has no clue what he's talking about" vs "I have no clue what's true because the LLM everyone expects me to learn from just confirms everything I ask regardless of what I'm asking".
A bit part of education is (should be) independent learning with textbooks and reading. You don't need to be "tutored".
Tutors should be able to approximate the ZPD better than any student can. Most students lack intrinsic motivation and it's a tutor's job to help them get started.
Okay, I was generalising from experience; most students I've met lack intrinsic motivation. School systems typically encourage students to depend on extrinsic motivation.
That’s rather disingenuous. But it seems nowadays that words have lost meaning… so, I don’t blame you. I blame the LLMs for this deterioration.
lol scraping the bottom of the barrel
Tutor mode sucks because even if it was actually accurate and didn't hallucinate, it isn't scoped to what the class is actually covering within a given topic.
Best AI is still your own brain, trained on paying attention in class and reading the assigned content.
One to one human tutors are the best way of educating kids and they are not scoped to what a class is covering. They teach the kid as an individual: they are effective because they can adjust to the kids abilities and interests.
> AI in 1:1 tutor mode with proper hardware, harness and guardrails should be wildly successful
I’m open to the idea! Show me the evidence. Then we can roll it out to our kids.
Glad you’re asking. https://cepr.org/publications/dp21577
“AI adoption raises homework scores by 18% and reduces completion time by 30%, but lowers monthly exam scores by 20% within six months. High-stakes entrance-exam scores fall by 18 and 24%, with the full penalty emerging only after about two years.”
Yup. Short-term metrics juice. Actual comprehension and cognition falls. This seems to be the case across the board, including with adults.
I’m genuinely optimistic that there is a way to make AI helpful in education. I just don’t think we’ve found it yet. (We certainly haven’t demonstrated it.)
> reduces completion time by 30%
This is probably the big problem, or at least one of them.. If you use less time on learning, it will probably be harder to remember what you learned also. We need to spend some time to make it stick
The behavioral issue I see is that LLM users tend to immediately reach for an LLM and do their thinking in concert with it.
This tempts users to approach problems by first feeding them into the LLM and then simply following the route the LLM lays out, which does improve task completion time for tasks that the LLM can simply regurgitate, but it stops the user from developing the actual critical thinking skills that school is supposed to teach.
It’s not just critical thinking skills, it’s also that there’s a big difference between recognition/following instructions, and recall/generating your own memories of an approach. But most students don’t recognize the difference. In other words, “following the route” is a big part of the problem - it doesn’t engage the brain the same and isn’t representative of real world use, and having something explained well doesn’t mean you can in turn explain it well yourself (the more revealing test of internalized true understanding)
Can agree on that.
The description of the paper also said:
AI users who maintain similar homework completion time as non-AI users experience small learning losses.
This was a surprise too me. I would have thought otherwise.
Would love to see some evidence about if more or less people fall behind and have worse results. In my head the AI should be able to get the weakest students a bit highere.
> In my head the AI should be able to get the weakest students a bit higher
I think the evidence so far is all students lose learning and cognition, but the brighter students lose less.
Based on the paper it hits harder on the brighter
From https://cepr.org/publications/dp21577 :
The losses are largest in social science subjects, followed by STEM and languages, and are especially large for junior students, high-achieving students, and boys.
No mention of the weakest student. Which probably means they did not have a significant worse or better score
> reach for an LLM and do their thinking in concert with it
This basically models how the intellectual work is going to be overwhelmingly done in the future.
With AI — +18%, without AI — -20%. Conclusion?
I think AI could (and by some students probably already is) be used to help a student better understand the material, and faster than you could before. I still recall some parts of Physics taking a while to click, and often having to reread different sections of a textbook to try and understand the what and why behind something.
The biggest issue is a child has to want to do that, since they also could just ask the AI for the answer and then go back to playing video games. End of the day past age 13 or so I just don't see legislation making any difference, they'll find a way past any law blocking them from using AI. Like a lot of education it'll probably come down to parenting.
> I think AI could (and by some students probably already is) be used to help a student better understand the material, and faster than you could before
I think so too. But we haven’t demonstrated we’ve found how, in kids or in adults.
> biggest issue is
We genuinely don’t know what the biggest issue is. We just know it doesn’t work. There is zero quality evidence for AI helping with learning or cognition in kids or adults. (Happy to be proven wrong. This is a fast-moving and big field.)
> they'll find a way past any law blocking them from using AI. Like a lot of education it'll probably come down to parenting
And community. Rich towns restrict devices in school, monitor use at home and thus will have less of an issue with AI exposure.
>I think so too. But we haven’t demonstrated we’ve found how, in kids or in adults.
Ask chatgpt or claude, on their highest model (probably unnecessary but I'm sensing a vibe) to explain a simple linear algebra problem, and if you don't understand it, ask about what part you don't understand.
And if you truly believe it made something up, prove it.
This is seriously the easiest thing to prove out there, you can see for yourself in the next 5 minutes.
> And if you truly believe it made something up, prove it.
You seem to be assuming that the issue is around factual correctness, and that may be the case but the evidence we have so far doesn't support jumping to such a narrow cause.
Is the poor performance because the LLMs are frequently wrong? Unknown.
Is it because the LLMs are sycophantic? Unknown.
Is it because the chat interface is a poor one for learning? Unknown.
What we do know is that students who rely on LLMs learn less and perform worse in the long term. And that alone is enough evidence to support a ban. If better tools come along in the future and are shown to aid learning, then the ban can be re-evaluated.
Sure. I already know linear algebra. If it’s a new branch of mathematics, this is a terrible way to learn.
Again, the research points almost exclusively in one direction when it comes to learning and cognition around AI. You’ll solve more problems more quickly but wind up learning and thinking less.
Honestly, what do you think a teacher does that an llm can't? If you want to learn you absolutely can ask an llm how to _solve_ x and explain the steps and why.
My leaving out the word solve seems to have led some of you astray, I apologize.
Again the problem is you have the option to solve your problem and move on without understanding it. That does not mean you can not use the tool to understand the problem and how to _solve_ it.
I live in fear that instead of learning how to use the tool, some might just vote to ban the tool.
> what do you think a teacher does that an llm can't?
I don’t know! It’s an interesting question. All we know is it does.
> That does not mean you can not use the tool to understand the problem and how to _solve_ it
It doesn’t. But we have no evidence it can.
We have lots of evidence of people thinking they’ve learned something, taking a benchmarking test, and being found wholly deficient compared to folks who worked through a textbook, went to a class or even solved problems off YouTube videos or instructional websites.
lol people like this guy prove AI psychosis is legit.
If you can’t figure out what the value add of a human teacher is then.. fkin lmao. It’s well beyond simply transmission of information.
The best teachers have passion - that passion is infectious. I was lucky enough to experience that and it grew my curiosity.
LLM’s provide no such equivalent.
> best teachers have passion
Not everyone has the “best teachers.” And passion is undefined. This is not a real argument.
> to explain a simple linear algebra problem, and if you don't understand it, ask about what part you don't understand.
The goal is not to understand a linear algebra problem. The goal is to learn how to solve it using lessons and techniques taught beforehand. Aka not to get a fish, but learning how to fish.
I'm sorry the wording of my post didn't match what you wanted.
Type in "Explain how to solve a simple linear algebra problem" into the AI of your choice instead.
> Type in "Explain how to solve a simple linear algebra problem" into the AI of your choice instead
I’m more interested in seeing how someone who teaches themselves with this approach scores on a standardized exam of linear-algebra competence.
> Type in "Explain how to solve a simple linear algebra problem" into the AI of your choice instead.
I’ve seen this particular philosophy in college where the student focus exclusively on passing exams. They would memorize notes and past exercises. The focus is on solving a particular set of exercises instead of understanding the concepts. Change things slightly and they’re lost.
That may not matter in college where you can focus on a few disciplines and half-ass the rest. But everything in lower stages is truly foundational.
A crucial part of learning is struggling with understanding and overcoming problems by yourself. AI removes that part.
>AI users who maintain similar homework completion time as non-AI users experience small learning losses.
Seems like there's no benefit even if it's used "correctly"?
Care to give us the bits you found interesting in the paper to spare me plonking down £6?
Would hate to dissect this just off a paragraph.
Considering that the paper concludes that even students who take the long approach and use LLMs in the most appropriate way for learning still retain less over the long term than students who simply don't use LLMs, I think it's likely they didn't read the paper in the first place.
fwiw, Alpha School is the supervised version. the New York campus is $65k/yr and not legally a school.
private school money with homeschool paperwork and an app doing the teaching.
https://www.wired.com/story/alpha-schools-new-york-city-camp...
We thought the same of electronic devices in general and digital learning content specifically. In actual practice both result in lowered test scores and declining critical thinking skills.
I think AI should be used in higher level schools but with the added requirement that the output will be held to a much higher standard and that it's fact checked. Teach the students to use AI to reach a higher level while mitigating the inherent issues like hallucination and sycophancy.
Idk why you screeching AI touts are so confident about its ‘wild’ success in all areas given absolutely zero evidence to that effect.
It’s tiresome.
It's inevitably your fault for prompting incorrectly or using the wrong model.
"You just have to repeat the prompt 3 times and then spin around counter-clockwise twice! That always works for me. You obviously just don't know how to prompt the model correctly."
Every time I see LLM enjoyers yapping on like this, it just reminds me of people trying to read tea leaves. There's all these goofy little rules about how to structure the prompt and how mean or nice to be to get it to work optimally, but I think it's obvious that most of these users are just seeing incidental successful outcomes in a largely random system and extrapolating from there because it makes them feel in control.
It is, quite literally, superstition.
Instead of prompts, let’s call them incantations.