This motivates the question: if you're doing all this work to verify the LLM, is the LLM really saving you anytime?

After just a few weeks in this brave new world my answer is: it depends, and I'm not really sure.

I think over time as both the LLMs get better and I get better at working with them, I'll start trusting them more.

One thing that would help with that would be for them to become a lot less random and less sensitive to their prompts.