Hacker News

Yes, and the article author is fully aware of that. Thank you for pointing out this small mistake though.

It looks like the author is specifically avoiding model's name, because results are really weird.

  Opus 4.8/4.7 scored 28%

  Opus 4.6 score 37%

So the author thought as let's not get into that just write Claude.

happycube 21 hours ago [ - ]

Not weird at all, given the variance in Opus' quality over the last few months.

wild guess - I wouldn't be surprised if Opus 4.6 was run quantized for a while, and 4.7/4.8 have QAT for that nerfed size.

andriy_koval a day ago [ - ]

many people think opus 4.6 was the best

insiderphd 10 hours ago [ - ]

Hello! Author here (Katie) Ty for your comments, 4.6 and 4.7 both scored 28% on our benchmark, I just wanted to have 10 things in the list because I wanted a round number.

raincole 18 hours ago [ - ]

Where is the weird part?