Yes, and the article author is fully aware of that. Thank you for pointing out this small mistake though.

It looks like the author is specifically avoiding model's name, because results are really weird.

  Opus 4.8/4.7 scored 28%

  Opus 4.6 score 37%

So the author thought as let's not get into that just write Claude.

Not weird at all, given the variance in Opus' quality over the last few months.

wild guess - I wouldn't be surprised if Opus 4.6 was run quantized for a while, and 4.7/4.8 have QAT for that nerfed size.

many people think opus 4.6 was the best

Hello! Author here (Katie) Ty for your comments, 4.6 and 4.7 both scored 28% on our benchmark, I just wanted to have 10 things in the list because I wanted a round number.

Where is the weird part?