> Which ones are you looking at? Since the benchmark comparison in the blogpost itself doesn't include Opus at all.

I manually compared it with the values from the benchmarks they published when they originally announced the Claude 3 model family[0].

Not all rows have a 1:1 row in the current benchmarks, but I think it paints a good enough picture.

[0]: https://www.anthropic.com/news/claude-3-family