Those stats dont necessarily line up that way. Do you have a link?

Given the way the test was structured it does line up.

https://arxiv.org/abs/2503.23674

Surprisingly good. I wonder how they would have done without the 5 minute limit on conversations (average of 8 messages per convo per the study)