Hacker News

> While hallucination is probably closer to 100% depending on the question.

But the benchmark didn't ask those questions, and it seems grok is very well at saying it doesn't know the answer otherwise.