This model has the best score on that benchmark.
Edit: Huh... It does score highest in "Omniscience", but also very high in Hallucination Rate (where higher score is worse)...
This model has the best score on that benchmark.
Edit: Huh... It does score highest in "Omniscience", but also very high in Hallucination Rate (where higher score is worse)...
this has one of the worse score in AA-Omniscience Hallucination Rate