I believe the sweet spot that makes it practical and reliable will be combining LLMs with formal verification, although I doubt current hardware is up to the task (yet).
LLMs basically solve the classic Frame problem that prevented general problem solvers to be able to reason logically about the real world; however on their own they are utterly unpredictable and unreliable.
However if the database of weights is merely used as a heuristic to guide the logical reasoning engine to promising regions of the problem space, and the program itself is written to specification directly by an inference engine, the result would be classic software not affected by hallucinations.
The LLM could even help debugging the specifications by pointing out unclear or contradicting requirements, improving the process without compromising the integrity of the result.