The scroes they're getting are on the order of 0-1% for this ARC-AGI-3 benchmark.

Didn’t I just see a post about 36% from someone?