So to verify their claims and see how strong these models are, the answer is "believe us"?

Note: I'm expressing some skepticism here largely due to how recent rollouts from Meta flopped. Sincerely hoping that they do better this time around!

I assume the answer is try it out in the chat mode? You could run your usual benches through that right