I would take all benchmarks with a grain of salt. I don't really use them. What's it supposed to tell me? "5% smarter", what does that mean? My experience will differ. Just try it!

I doubt Anthropic internally sets as a goal to improve this or that benchmark - it's just a way to visualize progress. They probably have much more complex metrics internally.