Their ARC-AGI-2 leaderboard[0] scores are insensitive to reasoning effort. Low effort gets 64.6% and High effort gets 69.2%.
This is unlike their previous generation of models and their competitors.
What does this indicate?
Their ARC-AGI-2 leaderboard[0] scores are insensitive to reasoning effort. Low effort gets 64.6% and High effort gets 69.2%.
This is unlike their previous generation of models and their competitors.
What does this indicate?