If what you want is auto-complete (e.g. CoPilot, or natural language search) then LLMs are built for that, and useful.
If what you want it AGI then design an architecture with the necessary moving parts! Current approach reminds of the joke of the drunk looking for his dropped cars keys under the street lamp because "it's bright here", rather than near where he actually dropped them. It seems folk have spent years trying to come up with alternate learning mechanisms to gradient descent (or RL), and having failed are now trying to use SGD/pre-training for AGI "because it's what we've got", as opposed to doing the hard work of designing the type of always-on online learning algorithm that AGI actually requires.
The SGD/pre training/deep learning/transformer local maxima is profitable. Trying new things is not, so you are relying on researchers making a breakthrough, but then to make a blip you need a few billion to move the promising model into production.
The tide of money flow means we are probably locked into transformers for some time. There will be transformer ASICs built for example in droves. It will be hard to compete with the status quo. Transformer architecture == x86 of AI.
I think it's possible that the breakthrough(s) needed for AGI could be developed anytime now, by any number of people (probably doesn't need to be a heavily funded industry researcher), but as long as people remain hopeful that LLMs just need a few more $10B's to become sentient, it might not be able to rise above the noise. Perhaps we need an LLM/dinosaur extinction event to give the mammals space to evolve...
Isn't RL the algorithm we want basically?
Want for what?
RL is one way to implement goal directed behavior (making decisions now that hopefully will lead towards a later reward), but I doubt this is the actual mechanism at play when we exhibit goal directed behavior ourselves. Something more RL-like may potentially be used in our cerebellum (not cortex) to learn fine motor skills.
Some of the things that are clearly needed for human-like AGI are things like the ability to learn incrementally and continuously (the main ways we learn are by trial and error, and by copying), as opposed to pre-training with SGD, things like working memory, ability to think to arbitrary depth before acting, innate qualities like curiosity and boredom to drive learning and exploration, etc.
The Transformer architecture underlying all of today's LLMs have none of the above, not surprising since it was never intended as a cognitive architecture - it was designed for seq2seq use such as language models (LLMs).
So, no, I don't think RL is the answer to AGI, and note that DeepMind who had previously believed that have since largely switched to LLMs in the pursuit of AGI, and are mostly using RL as part of more specialized machine learning applications such as AlphaGo and AlphaFold.
But RL algorithms do implement things like curiosity to drive exploration?? https://arxiv.org/pdf/1810.12894.
Thinking to arbitrary depth sounds like Monte Carlo tree search? Which is often implemented in conjunction with RL. And working memory I think is a matter of the architecture you use in conjunction with RL, agree that transformers aren't very helpful for this.
I think what you call 'trial and error', is what I intuitively think of RL as doing.
AlphaProof runs an RL algorithm during training, AND at inference time. When given an olympiad problem, it generates many variations on that problem, tries to solve them, and then uses RL to effectively finetune itself on the particular problem currently being solved. Note again that this process is done at inference time, not just training.
And AlphaProof uses an LLM to generate the Lean proofs, and uses RL to train this LLM. So it kinda strikes me as a type error to say that DeepMind have somehow abandoned RL in favour of LLMs? Note this Demis tweet https://x.com/demishassabis/status/1816596568398545149 where it seems like he is saying that they are going to combine some of this RL stuff with the main gemini models.
> But RL algorithms do implement things like curiosity to drive exploration??
I hadn't read that paper, but yes using prediction failure as learning signal (and attention mechanism), same as we do, is what I had in mind, but it seems that to be useful it needs to be combined with online learning ability, so that having explored then next time one's predictions will be better.
It's easy to imagine LLM's being extended in all sorts of ad-hoc ways, including external prompting/scaffolding such as think step by step and tree search, which help mitigate some of the architectural shortcomings, but I think online learning is going to be tough to add in this way, and it also seems that using the model's own output as a substitute for working memory isn't sufficient to support long term focus and reasoning. You can try to script intelligence by putting the long-term focus and tree search into an agent, but I think that will only get you so far. At the end of the day a pre-trained transformer really is just a fancy sentence completion engine, and while it's informative how much "reactive intelligence" emerges from this type of frozen prediction, it seems the architecture has been stretched about as far as it will go.
I wasn't saying that DeepMind have abandoned RL in favor of LLMs, just that they are using RL in more narrow applications than AGI. David Silver at least still also seems to think that "Reward is enough" [for AGI], as of a few years ago, although I think most people disagree.
Hmm well the reason a pre-trained transformer is a fancy sentence completion engine is because that is what it is trained on, cross entropy loss on next token prediction. As I say, if you train an LLM to do math proofs, it learns to solve 4 out of the 6 IMO problems. I feel like you're not appreciating how impressive that is. And that is only possible because of the RL aspect of the system.
To be clear, i'm not claiming that you take an LLM and do some RL on it and suddenly it can do particular tasks. I'm saying that if you train it from scratch using RL it will be able to do certain well defined formal tasks.
Idk what you mean about the online learning ability tbh. The paper uses it in the exact way you specify, which is that it uses RL to play montezuma's revenge and gets better on the fly.
Similar to my point about the inference time RL ability of the alphaProof LLM. That's why I emphasized that RL is done at inference time, like each proof you do it uses to make itself better for next time.
I think you are taking LLM to mean GPT style models, and I am taking LLM to mean transformers which output text, and they can be trained to do any variety of things.
A transformer, regardless of what it is trained to do, is just a pass thru architecture consisting of a fixed number of layers, no feedback paths, and no memory from one input to the next. Most of it's limitations (wrt AGI) stem from the architecture. How you train it, and on what, can't change that.
Narrow skills like playing Chess (DeepBlue), Go, or math proofs are impressive in some sense, but not the same as generality and/or intelligence which are the hallmarks of AGI. Note that AlphaProof, as the same suggests, has more in common with AlphaGo and AlphaFold than a plain transformer. It's a hybrid neuro-symbolic approach where the real power is coming from the search/verification component. Sure, RL can do some impressive things when the right problem presents itself, but it's not a silver bullet to all machine learning problems, and few outside of David Silver think it's going to be the/a way to achieve AGI.
I agree with you that transformers are probably not the architecture of choice. Not sure what that has to do with the viability of RL though.