I've been doing things like accounting where I upload receipts and have the LLM adjust a Google sheet with the money balances. The error rate over the past year has dropped from occasionally to never. That is because there's sub agents now running that check the work. If you have multiple LLMs running with a 94% success rate but you throw them into a group that requires a consensus suddenly the number basically hits 99%.
We simply need to run sub agents on the children's learning, then we will maximize pedagogic efficiency to 99%.