The scroes they're getting are on the order of 0-1% for this ARC-AGI-3 benchmark.
Didn’t I just see a post about 36% from someone?
Didn’t I just see a post about 36% from someone?