I'll probably be the skeptic here, but:

- Take a person who grew up playing video games. They'll pass these tests 100% without even breaking a sweat.

- BUT, put a grandmother who has never used a computer in front of this game, and she'll most likely fail completely. Just like an LLM.

As soon as models are "natively" trained on a massive dataset of these types of games, they'll easily adapt and start crushing these challenges.

This is not AGI at all.

Isn’t this what AGI is by design? People CAN learn to become good at videogames. Modern LLMs can’t, they have to be retrained from scratch (I consider pre-training to be a completely different process than learning). I also don’t necessarily agree that a grandma would fail. Give her enough motivation and a couple days and she’ll manage these.

My main criticism would be that it doesn’t seem like this test allows online learning, which is what humans do (over the scale of days to years). So in practice it may still collapse to what you point out, but not because the task is unsuited to showing AGI.

What I'm saying is that this test is just another "out-of-distribution task" for an LLM. And it will be solved using the exact same methods we always use: it will end up in the pre-training data, and LLMs will crush it.

This has absolutely nothing to do with AGI. Once they beat these tests, new ones will pop up. They'll beat those, and people will invent the next batch.

The way I see it, the true formula for AGI is: [Brain] + [External Sensors] (World Receptors) + [Internal State Sensors] + [Survival Function] + [Memory].

I won't dive too deep into how each of these components has its own distinct traits and is deeply intertwined with the others (especially the survival function and memory). But on a fundamental level, my point is that we are not going to squeeze AGI out of LLMs just by throwing more tests and training cycles at them.

These current benchmarks aren't bringing us any closer to AGI. They merely prove that we've found a new layer of tasks that we simply haven't figured out how to train LLMs on yet.

P.S. A 2-year-old child is already an AGI in terms of its functional makeup and internal interaction architecture, even though they are far less equipped for survival than a kitten. The path to AGI isn't just endless task training—it's a shift toward a fundamentally different decision-making architecture.

good post, but I disagree Surival Function is needed for AGI. Why do you think Survival Function is needed?

The item I think you should add is a Mesolimbic System (Reward / Motivation). I think AGI needs motivation to direct its learning and tasks.

Also, I don't think the industry has just been training LLMs with more data to get advancement the last 2 years. RAG / Agents loops / skills / context mgmt are all just early forms a Memory system. An LLM with an updatable working set memory is a lot more capable than just an LLM.

> . Once they beat these tests, new ones will pop up. They'll beat those, and people will invent the next batch.

that's exactly the point! once we cannot invent the next batch (that is easy for humans to solve), that will be AGI

Kids develop video game skills, grandmothers do not. Hypothetically grandmothers develop baking skills, that kids do not (perfectly golden brown cookies). A human intelligence is generally capable of developing video game skills or baking skills, given enough motivation and experience to hone those skills. One test of AGI is if the same system can develop video game skills and baking skills, without having to rebuild the core models... this would demonstrate generalized intelligence.

> Isn’t this what AGI is by design?

Well, the "G" in AGI is kinda important. These are specifically games/puzzles.

> they have to be retrained from scratch

Is that true? Didn't DeepMind already build plenty of agents that are generally good at most computer games without being retrained?

[dead]

had the same thought.

I've been a gamer for just about 40 years. Gaming is my "thing"

I found the challenges fun, but easy. Coming back and reading comments from people struggling with the games, my first thought was - yup definitely not a gamer.

My approach was to poke at the controls to suss the rules, then the actual solutions were really straightforward.

fwiw, I'm pretty dumb generally, but these kinds of puzzles are my jam.

Bingo! That's exactly what I meant