Hacker News

Their ARC-AGI-2 leaderboard[0] scores are insensitive to reasoning effort. Low effort gets 64.6% and High effort gets 69.2%.

This is unlike their previous generation of models and their competitors.

What does this indicate?