Hacker News

I unironically believe that arc-agi-3 will have a introduction to solved time of 1 month

Not very likely?

ARC-AGI-3 has a nasty combo of spatial reasoning + explore/exploit. It's basically adversarial vs current AIs.

We will see at the end of April right? It's more of a guess than a strongly held conviction--but I see models improving rapidly at long horizon tasks so I think it's possible. I think a benchmark which can survive a few months (maybe) would be if it genuinely tested long time-frame continual learning/test-time learning/test-time posttraining (idk honestly the differences b/t these).

But i'm not sure how to give such benchmarks. I'm thinking of tasks like learning a language/becoming a master at chess from scratch/becoming a skill artists but where the task is novel enough for the actor to not be anywhere close to proficient at beginning--an example which could be of interest is, here is a robot you control, you can make actions, see results...become proficient at table tennis. Maybe another would be, here is a new video game, obtain the best possible 0% speedrun.

etyhhgfff 2 months ago [ - ]

The AGI bar has to be set even higher, yet again.

red75prime 2 months ago [ - ]

And that's the way it should be. We're past the "Look! It can talk! How cute!" stage. AGI should be able to deal with any problem a human can.

dakolli 2 months ago [ - ]

wow solving useless puzzles, such a useful metric!

esafak 2 months ago [ - ]

How is spatial reasoning useless??