the most cited is terminal bench 2.0, but its also plagued by cheating accusations and benchmaxxing.
somewhat remarkably, claude code ranks last for Opus 4.6 - which may say something about cc, or say something about the benchmark
the most cited is terminal bench 2.0, but its also plagued by cheating accusations and benchmaxxing.
somewhat remarkably, claude code ranks last for Opus 4.6 - which may say something about cc, or say something about the benchmark