Hacker News

Not hard to understand what's going on here. They RL'd around patterns in their data and specific capabilities, so of course they'd construct a benchmark that's aligned with the training set.

Ironically, their benchmark might be more accurate than artificial analysis for a narrow slice of things that Cursor's Eigencustomer is really interested in. Otherwise I'd take it as just another data point.