> that’s good enough as far as I’m concerned

But in that case, why an LLM. If we want Question-Answer machines to be reliable, they must have the skills which include "counting" just as a basic example.

The purpose of the LLM would be to translate natural language into computer language, not to do the calculation itself.

But in that case, /all/ the LLM would be allowed to do would be to «translate natural language into computer language». And why and how should and could it achieve reliability in that - though not in other realms?