Very (reasoning) heavy benchmarks do seem like the way to go, being the hardest to game.