Hacker News

Why does anthropic change the set of benchmarks they use with every new model release?

1. Benchmarks saturate 2. They select the most impressive improvments