+1 to treating LLM output as untrusted input.
The failure mode I keep seeing isn’t hallucination per se… it’s blurred responsibility between intent and execution. Once a model can both decide and act, you’ve already lost determinism.
The stable pattern seems to be- LLMs generate proposals, deterministic systems enforce invariants. Creativity upstream, boring execution downstream.
What matters in production isn’t making models smarter, it’s making failure modes predictable and auditable. If you can’t explain why a state transition was allowed, the architecture is already too permissive.