We did a lot of internal testing but no official benchmark.
We find that the less the agent knows, the more it hallucinates