So given that the output of an LLM is unreliable at best, your plan is to verify that a LLM didn't bullshit you by asking another LLM?

That sounds... counterproductive