Benchmark geometric mean
- GPT-5.5: 62.7%
- Opus 4.8: 62.2%
- Kimi K2.7 Code: 56.3%
- Kimi K2.6: 48.2%
Would be nice to have 5.2 and 4.6 for comparison.
Would be nice to have 5.2 and 4.6 for comparison.