> The non-hallucination rate in AA-omniscience is SOTA
Note that a perfect "non-hallucination rate" is rather meaningless as such tests can contain human hallucinations.
It means the model aligns with the possibly-true, possibly-false beliefs of the group that made the test.
Well, yes, garbage in garbage out. That's a given and not what's meant by "hallucination" in this context.
the observation goes beyond garbage in garbage out. Mainly that we're always operating from some prior and limited understanding. That what may look like a hallucination could be closer to the truth than our current frameworks of understanding allow us to admit. The hermeneutic circle.
Interesting. I wonder if current LLMs can break out of human limitations and understand the world more correctly.
Here are some examples of the questions in the benchmark. If these are representative, they seem pretty cut and dry. https://artificialanalysis.ai/evaluations/omniscience#exampl...
Was there something about this specific model and submission that made you feel compelled to write this self-evident observation?
Or would you describe your methodology as more like picking a random sentence fragment as an input value then generating completions from your existing corpus without any post-input "learning" process related to the rest of the source material?
[dead]