The difference is that llms mess with our heuristics. They certainly aren’t infallible but over time we develop a sense for when someone is full of shit. The mix and match nature of llms hides that.

You need different heuristics for LLMs. If the answer is extremely likely/consistent and not embedded in known facts alarm bells should go off.

A bit like the tropes in movies where the protagonists get suspicious because the antagonists agree to every notion during negotiations because they will betray them anyway.

The LLM will hallucinate a most likely scenario that conforms to your input/wishes.

I do not claim any P(detect | hallucination) but my P(hallucination | detect) is pretty good.