Been shouting this for over a year now. We're training AI to be convincing, not to be actually helpful. We're sampling the wrong distributions.
Been shouting this for over a year now. We're training AI to be convincing, not to be actually helpful. We're sampling the wrong distributions.
Depends on who you ask.
Advertisement and propaganda is not necessarily helpful for consumers, but just needs to be convincing in order to be helpful for producers.
It would be interesting to see RL on a chatbot that's the last stage of a sales funnel for some high-volume item--it'd have fast, real-world feedback on how convincing it is, in the form of a purchase decision.
If what you want is auto-complete (e.g. CoPilot, or natural language search) then LLMs are built for that, and useful.
If what you want it AGI then design an architecture with the necessary moving parts! Current approach reminds of the joke of the drunk looking for his dropped cars keys under the street lamp because "it's bright here", rather than near where he actually dropped them. It seems folk have spent years trying to come up with alternate learning mechanisms to gradient descent (or RL), and having failed are now trying to use SGD/pre-training for AGI "because it's what we've got", as opposed to doing the hard work of designing the type of always-on online learning algorithm that AGI actually requires.
The SGD/pre training/deep learning/transformer local maxima is profitable. Trying new things is not, so you are relying on researchers making a breakthrough, but then to make a blip you need a few billion to move the promising model into production.
The tide of money flow means we are probably locked into transformers for some time. There will be transformer ASICs built for example in droves. It will be hard to compete with the status quo. Transformer architecture == x86 of AI.
I think it's possible that the breakthrough(s) needed for AGI could be developed anytime now, by any number of people (probably doesn't need to be a heavily funded industry researcher), but as long as people remain hopeful that LLMs just need a few more $10B's to become sentient, it might not be able to rise above the noise. Perhaps we need an LLM/dinosaur extinction event to give the mammals space to evolve...
Isn't RL the algorithm we want basically?
Want for what?
RL is one way to implement goal directed behavior (making decisions now that hopefully will lead towards a later reward), but I doubt this is the actual mechanism at play when we exhibit goal directed behavior ourselves. Something more RL-like may potentially be used in our cerebellum (not cortex) to learn fine motor skills.
Some of the things that are clearly needed for human-like AGI are things like the ability to learn incrementally and continuously (the main ways we learn are by trial and error, and by copying), as opposed to pre-training with SGD, things like working memory, ability to think to arbitrary depth before acting, innate qualities like curiosity and boredom to drive learning and exploration, etc.
The Transformer architecture underlying all of today's LLMs have none of the above, not surprising since it was never intended as a cognitive architecture - it was designed for seq2seq use such as language models (LLMs).
So, no, I don't think RL is the answer to AGI, and note that DeepMind who had previously believed that have since largely switched to LLMs in the pursuit of AGI, and are mostly using RL as part of more specialized machine learning applications such as AlphaGo and AlphaFold.
But RL algorithms do implement things like curiosity to drive exploration?? https://arxiv.org/pdf/1810.12894.
Thinking to arbitrary depth sounds like Monte Carlo tree search? Which is often implemented in conjunction with RL. And working memory I think is a matter of the architecture you use in conjunction with RL, agree that transformers aren't very helpful for this.
I think what you call 'trial and error', is what I intuitively think of RL as doing.
AlphaProof runs an RL algorithm during training, AND at inference time. When given an olympiad problem, it generates many variations on that problem, tries to solve them, and then uses RL to effectively finetune itself on the particular problem currently being solved. Note again that this process is done at inference time, not just training.
And AlphaProof uses an LLM to generate the Lean proofs, and uses RL to train this LLM. So it kinda strikes me as a type error to say that DeepMind have somehow abandoned RL in favour of LLMs? Note this Demis tweet https://x.com/demishassabis/status/1816596568398545149 where it seems like he is saying that they are going to combine some of this RL stuff with the main gemini models.
> But RL algorithms do implement things like curiosity to drive exploration??
I hadn't read that paper, but yes using prediction failure as learning signal (and attention mechanism), same as we do, is what I had in mind, but it seems that to be useful it needs to be combined with online learning ability, so that having explored then next time one's predictions will be better.
It's easy to imagine LLM's being extended in all sorts of ad-hoc ways, including external prompting/scaffolding such as think step by step and tree search, which help mitigate some of the architectural shortcomings, but I think online learning is going to be tough to add in this way, and it also seems that using the model's own output as a substitute for working memory isn't sufficient to support long term focus and reasoning. You can try to script intelligence by putting the long-term focus and tree search into an agent, but I think that will only get you so far. At the end of the day a pre-trained transformer really is just a fancy sentence completion engine, and while it's informative how much "reactive intelligence" emerges from this type of frozen prediction, it seems the architecture has been stretched about as far as it will go.
I wasn't saying that DeepMind have abandoned RL in favor of LLMs, just that they are using RL in more narrow applications than AGI. David Silver at least still also seems to think that "Reward is enough" [for AGI], as of a few years ago, although I think most people disagree.
Hmm well the reason a pre-trained transformer is a fancy sentence completion engine is because that is what it is trained on, cross entropy loss on next token prediction. As I say, if you train an LLM to do math proofs, it learns to solve 4 out of the 6 IMO problems. I feel like you're not appreciating how impressive that is. And that is only possible because of the RL aspect of the system.
To be clear, i'm not claiming that you take an LLM and do some RL on it and suddenly it can do particular tasks. I'm saying that if you train it from scratch using RL it will be able to do certain well defined formal tasks.
Idk what you mean about the online learning ability tbh. The paper uses it in the exact way you specify, which is that it uses RL to play montezuma's revenge and gets better on the fly.
Similar to my point about the inference time RL ability of the alphaProof LLM. That's why I emphasized that RL is done at inference time, like each proof you do it uses to make itself better for next time.
I think you are taking LLM to mean GPT style models, and I am taking LLM to mean transformers which output text, and they can be trained to do any variety of things.
A transformer, regardless of what it is trained to do, is just a pass thru architecture consisting of a fixed number of layers, no feedback paths, and no memory from one input to the next. Most of it's limitations (wrt AGI) stem from the architecture. How you train it, and on what, can't change that.
Narrow skills like playing Chess (DeepBlue), Go, or math proofs are impressive in some sense, but not the same as generality and/or intelligence which are the hallmarks of AGI. Note that AlphaProof, as the same suggests, has more in common with AlphaGo and AlphaFold than a plain transformer. It's a hybrid neuro-symbolic approach where the real power is coming from the search/verification component. Sure, RL can do some impressive things when the right problem presents itself, but it's not a silver bullet to all machine learning problems, and few outside of David Silver think it's going to be the/a way to achieve AGI.
I agree with you that transformers are probably not the architecture of choice. Not sure what that has to do with the viability of RL though.
Sideways eye look at leetcode culture
I find them very helpful, personally.
Understandable, they have been trained to convince you of their helpfulness.
If they convinced me of their helpfulness, and their output is actually helpful in solving my problems.. well, if it walks like a duck and quacks like a duck, and all that.
if it walks like a duck and it quacks like a duck, then it lacks strong typing.
"Appears helpful" and "is helpful" are two very different properties, as it turns out.
Sometimes, but that's an edge case that doesn't seem to impact the productivity boosts from LLMs
It doesn't until it does. Productivity isn't the only or even the most important metric, at least in software dev.
Can you be more specific with like examples or something?
This is true, but part of that convincing is actually providing at least some amount of response that is helpful and moving you forward.
I have to use coding as an example, because that's 95% of my use cases. I type in a general statement of the problem I'm having and within seconds, I get back a response that speaks my language and provides me with some information to ingest.
Now, I don't know for sure if everything sentence I read in the response is correct, but let's say that 75% of what I read aligns with what I currently know to be true. If I were to ask a real expert, I'd possibly understand or already know 75% of what they're telling me, as well, with the other 25% still to be understood and thus trusting the expert.
But either with AI or a real expert, for coding at least, that 25% will be easily testable. I go and implement and see if it passes my test. If it does, great. If not, at least I have tried something and gotten farther down the road in my problem solving.
Since AI generally does that for me, I am convinced of their helpfulness because it moves me along.
https://xkcd.com/810/
s / AI / Marketing|Ads|Consultants|Experts|Media|Politicians|...