> very hesitant to trust studies like this

Why? Simply because there is a plethora of "studies" from the AI industry benchmaxing? Or that every single time the outcome is in favor of the tools then when actually checking the methodology they are comparing apple and oranges? Truly I don't get your skepticism. /s obviously.

Jokes aside whenever I read about such a study from a field that is NOT mine I try to get the opinion of an actual expert. They actually know the realistic context that typically make the study crumble under proper scrutiny.