Just have a second (cheap) model check if it can find any hallucinations. That should catch nearly all of them in my experience.

What is an efficient process for doing this? For each output from LLM1, you paste it into LLM2 and say "does this sound right?"?

If it's that simple, is there a third system that can coordinate these two (and let you choose which two/three/n you want to use?

Markdown files are everything. I use LLMs to create .md files to create and refine other .md files and then somewhere down the road I let another LLM write the code. It can also do fancy mermaid diagrams.

Have it create a .md and then run another one to check that .md for hallucinations.

You can use existing guardrails software to implement this efficiently:

NVIDIA NeMo offers a nice bundle of tools for this, among others an interface to Cleanlabs API to check for thruthfullness in RAG apps.

I realized that this is something that someone with Claude Code could reasonably easily test (at least exploratively).

Generate 100 prompts of "Famous (random name) did (random act) in the year (random). Research online and elaborate on (random name) historical significance in (randomName)historicalSignificance.md. Dont forget to list all your online references".

Then create another 100 LLMs with some hallucination Checker claude.md that checks their corresponding md for hallucinations and write a report.md.