Can I ask what your task and application is? A ~20% failure rate sounds atypical. If you’re slightly hyperbolic and mean something like 2-5%, yeah that’s a property of LLMs; but also heavily affected by how you prompt and how you constrain the task.
An auditing/QA step (whether a grading checklist, verification, etc) can get you further. Likewise for a planning step.