It is weird to read because they bring up many things a lot of people have been critiquing for years.

  > But as impressive as these feats are, they obscure a simple truth: being a "test-taker" is not what most people need from an AI.
  > In all these cases, humans aren't relying solely on a fixed body of knowledge learned years ago. We are learning, in real-time, from the context right in front of us.
  > To bridge this gap, we must fundamentally change our optimization direction.
I'm glad the conversation is changing but it's been a bit frustrating that when these issues were brought up people blindly point to benchmarks. It made doing this type of research difficult (enough to cause many to be pushed out). Then it feels weird to say "harder than we thought" because well... truthfully, they even state why this result should be expected

  > They rely primarily on parametric knowledge—information compressed into their weights during massive pre-training runs. At inference time, they function largely by recalling this static, internal memory, rather than actively learning from new information provided in the moment.
And that's only a fraction of the story. Online algorithms aren't enough. You still need a fundamental structure to codify and compress information, determine what needs to be updated (as in what is low confidence), to actively seek out new information to update that confidence, make hypotheses, and so so much more.

So I hope the conversation keeps going in a positive direction but I hope we don't just get trapped in a "RL will solve everything" trap. RL is definitely a necessary component and no doubt will it result in improvements, but it also isn't enough. It's really hard to do deep introspection into how you think. It's like trying to measure your measuring stick with your measuring stick. It's so easy to just get caught up in oversimplification and it seems like the brain wants to avoid it. To quote Feynman: "The first principle is to not fool yourself, and you're the easiest person to fool." It's even easier when things are exciting. It's so easy because you have evidence for your beliefs (like I said, RL will make improvements). It's so easy because you're smart, and smart enough to fool yourself. So I hope we can learn a bigger lesson: learning isn't easy, scale is not enough. I really do think we'll get to AGI but it's going to be a long bumpy road if we keep putting all our eggs in one basket and hoping there's simple solutions.

  > But as impressive as these feats are, they obscure a simple truth: being a "test-taker" is not what most people need from an AI.
People have been bringing that up long before AI, on how schooling often tests on memorization and regurgitation of facts. Looking up facts is also a large part of the internet, so it is something that's in demand, and i believe a large portion of openAI/cluade prompts have a big overlap with google queries [sorry no source].

I haven't looked at the benchmark details they've used, and it may depend on the domain, empirically it seems coding agents improve drastically on unseen libs or updated libs with the latest documentation. So I think that a matter of the training sets, where they've been optimized with code documentation.

So the interim step until a better architecture is found is probably more / better training data.

Don't confuse what I'm saying, I do find LLMs useful. You're right, about knowledge based systems being useful and I'm not disagreeing with that in any way. I don't think any of the researchers claiming LLMs are not a viable path to AGI are. We're saying that intelligence is more than knowledge. Superset, not disjoint.

And yes, the LLM success has been an important step to AGI but that doesn't mean we can't scale it all the way there. We learned a lot about knowledge systems. That's a big step. But if you wonder why people like Chollet are saying LLMs have held AGI progress back it is because we put all our eggs in one basket. It's because we've pulled funds and people away from other hard problems to focus on only one. That doesn't mean it isn't a problem that needed to be solved (nor that it is solved) but that research slows or stops on the other problems. When that happens we hit walls as we can't seamlessly transition. I'm not even trying to say that we shouldn't have most researchers working on the problem that's currently yielding the most success, but the distribution right now is incredibly narrow (and when people want to work on other problems they get mocked and told that the work is pointless. BY OTHER RESEARCHERS).

Sure, you can get to the store navigating block by block, but you'll get there much faster, more easily, and better adapt to changes in traffic if you incorporate route planning. You would think a bunch of people who work on optimization algorithms would know that A* is a better algorithm than DFS. The irony is that the reason we do DFS is because people have convinced themselves that we can just keep going this route to get there but if more intellectual depth (such as diving into more mathematical understandings of these models) was taken then you couldn't be convinced of that.

For all the disparagement of “fact regurgitation” as pedagogical practice, it’s not like there’s some proven better alternative. Higher-order reasoning doesn’t happen without a thorough catalogue of domain knowledge readily accessible in your context window.