On my tests[0] it does a bit worse, and it's almost 2x expensive than Opus 4.7...
I was surprised to see that it failed a Data extraction test (it gets it right 2/3 times, but one time it randomly returns null for a value instead).
It makes sense a bit that it fails more Trivia/Domain-specific knowledge tasks (I think models are more and more trained towards agentic use-case than general intelligence).
[0]: https://aibenchy.com/compare/anthropic-claude-opus-4-7-mediu...
For some reason everything is 2x (2x cost, 2x avg response time, 2x reasoning and output tokens)...
Double-checking my test harness, but it's the first model that does this, so I doubt the issue is on my side...
EDIT: Harness seems correct, for straight coding tasks they perform identical: https://i.snipboard.io/5xbpzY.jpg
Wait, doesn’t the blog post say the price is the same as 4.7?
> Claude Opus 4.8 is available everywhere today. Pricing for regular usage is unchanged from Opus 4.7: $5 per million input tokens and $25 per million output tokens. Pricing for fast mode is $10 per million input tokens and $50 per million output tokens.
Where do you see the 2x cost?
The total cost of running my benchmarks, was 1.6x higher compared to Opus 4.7, mostly because of 2x output tokens:
https://i.snipboard.io/vrdwTa.jpg
ah ok, thanks for clarifying!
If it spends 2x tokens to achieve the same result, that's effective 2x cost in a manner of speaking
Releasing a new model is the new way to Jack up the price hehe.
That's exactly right.