They would score much worse on the private set than the public set. And they haven't done this for any of the other ARC-AGI benchmarks, so why would they do it for this one?
They would score much worse on the private set than the public set. And they haven't done this for any of the other ARC-AGI benchmarks, so why would they do it for this one?