This model has the best score on that benchmark.

Edit: Huh... It does score highest in "Omniscience", but also very high in Hallucination Rate (where higher score is worse)...

this has one of the worse score in AA-Omniscience Hallucination Rate