The purpose of the LLM would be to translate natural language into computer language, not to do the calculation itself.

But in that case, /all/ the LLM would be allowed to do would be to «translate natural language into computer language». And why and how should and could it achieve reliability in that - though not in other realms?