I unironically believe that arc-agi-3 will have a introduction to solved time of 1 month

Not very likely?

ARC-AGI-3 has a nasty combo of spatial reasoning + explore/exploit. It's basically adversarial vs current AIs.

We will see at the end of April right? It's more of a guess than a strongly held conviction--but I see models improving rapidly at long horizon tasks so I think it's possible. I think a benchmark which can survive a few months (maybe) would be if it genuinely tested long time-frame continual learning/test-time learning/test-time posttraining (idk honestly the differences b/t these).

But i'm not sure how to give such benchmarks. I'm thinking of tasks like learning a language/becoming a master at chess from scratch/becoming a skill artists but where the task is novel enough for the actor to not be anywhere close to proficient at beginning--an example which could be of interest is, here is a robot you control, you can make actions, see results...become proficient at table tennis. Maybe another would be, here is a new video game, obtain the best possible 0% speedrun.

The AGI bar has to be set even higher, yet again.

And that's the way it should be. We're past the "Look! It can talk! How cute!" stage. AGI should be able to deal with any problem a human can.

wow solving useless puzzles, such a useful metric!

How is spatial reasoning useless??