Hacker News

throawayonthe 12 hours ago [ - ]

well there is https://artificialanalysis.ai/evaluations/omniscience

goldenarm 11 hours ago [ - ]

It's a gibberish input detection benchmark, and does not measure output hallucinations.