Link to paper here https://arxiv.org/pdf/2506.21734

Still reading, but the benchmarks for ARC-AGI-1, ARC-AGI-2, Sudoku-Extreme (9x9), and Maze-Hard (30x30) look impressive.

on gh someone reproduced but paper lacks total gpu hours and their benchmark results where 10-20% lower (read on gh issue)