Hacker News

bicx 5 hours ago [ - ]

That last benchmark seemed like an impressive leg up against Opus until I saw the sneaky footnote that it was actually a Sonnet result. Why even include it then, other than hoping people don't notice?

osti 5 hours ago [ - ]

It's only that one number that is for sonnet.

0123456789ABCDE 4 hours ago [ - ]

except for the webarena-verified

conradkay 5 hours ago [ - ]

Sonnet was pretty close to (or better than) Opus in a lot of benchmarks, I don't think it's a big deal

jitl 5 hours ago [ - ]

wat

0123456789ABCDE 4 hours ago [ - ]

maybe gp's use of the word "lots" is unwarranted

https://artificialanalysis.ai indicates that sonnect 4.6 beats opus 4.6 on GDPval-AA, Terminal-Bench Hard, AA Long context Reasoning, IFBench.

see: https://artificialanalysis.ai/?models=claude-sonnet-4-6%2Ccl...