I hope this kind of stuff puts the idea to rest that we're close to actual AGI. Outsourcing this kind of basic stuff which a real intelligence would be able to do "internally" is a hack which works for this specific case but would prevent further generalizations of the task at hand.
But I'm forseeing the opposite. This kind of tool use will soon be integrated and hidden such that people will eventully say "see we solved the problem that AI can't do 123+456, now we are really really close to AGI. Yeah no, with an AGI, it would have been the AGI itself that would have come up with needing at tool, building the tool and then using the tool. But that's not what LLMs are. They are statistical machines to predict tokens. They are very good at it, but that's not an AGI.