> How is it different from a person believing whatever they read on the Internet?
The problem is LLMs have no capacity for shame.
My Dad got taken in by a Target gift card scam. He felt so terrible, he almost didn't even tell me about it. He may get scammed again, but not by anything remotely like that.
To LLMs, all mistakes just get washed together into the same bucket. They don't spend days feeling depressed and stupid over getting scammed. There's no giant blinking red light that says, "Never let this happen again!"
> The problem is LLMs have no capacity for shame.
I know what you mean but I can't help but be cheeky: https://www.fastcompany.com/91383271/googles-chatbot-apologi...
Jokes aside, shame does not change the underlying point though. Despite feeling ashamed for being tricked, as you point out people can still get scammed again by different tricks. I think your point is more about learning from mistakes than shame.
Which still does not change the underlying point, I suppose. Offhand I cannot think of anything that would fix this problem for LLMs that wouldn't also fix it for humans, like relying on trusted sources.
>The problem is LLMs have no capacity for shame
You seem to be implying that people do, and I'd like to contest that point gestures wildly at everything
This is a great point. I've added it to my list of things when talking about the limitations of LLM.
IMO we must take it a step further: In this context "the LLM" we're all automatically thinking-of doesn't exist, it is a fictional character we humans "see" inside a story being acted-out or read to us. (In contrast, the real-world LLM is an algorithm in a basement constantly taking documents and making them slightly longer based on trends detected in all documents.)
Therefore "the LLM can't feel shame" is true in the same way that "CyberDracula thirsts for the fluids of the innocent." Good news: Vampirism doesn't exist! Bad news: Curing Dracula is impossible, because the patient doesn't exist either. Go looking for the target mind we wanted to make more-intelligent or kinder, and it turns out to be a trick of the light.
The best we can do is change the generator process, so that the next story instead contains a different new character also named after Dracula (or a brand of LLM) that sounds smarter or is narrated with kinder actions.
Perhaps the end state is going to be from the last Hitchhiker's Guide to the Galaxy book, Mostly Harmless:
> Anything that thinks logically can be fooled by something else that thinks at least as logically as it does. The easiest way to fool a completely logical robot is to feed it with the same stimulus sequence over and over again so it gets locked in a loop. This was best demonstrated by the famous Herring Sandwich experiments conducted millennia ago at MISPWOSO (the MaxiMegalon Institute of Slowly and Painfully Working Out the Surprisingly Obvious).
> A robot was programmed to believe that it liked herring sandwiches. This was actually the most difficult part of the whole experiment. Once the robot had been programmed to believe that it liked herring sandwiches, a herring sandwich was placed in front of it. Where upon the robot thought to itself, Ah! A herring sandwich! I like herring sandwiches.
> It would then bend over and scoop up the herring sandwich in its herring sandwich scoop, and then straighten up again. Unfortunately for the robot, it was fashioned in such a way that the action of straightening up caused the herring sandwich to slip straight back off its herring sandwich scoop and fall on to the floor in front of the robot. Whereupon the robot thought to itself, Ah! A herring sandwich...etc., and repeated the same action over and over again. The only thing that prevented the herring sandwich from getting bored with the whole damn business and crawling off in search of other ways of passing the time was that the herring sandwich, being just a bit of dead fish between a couple of slices of bread, was marginally less alert to what was going on than was the robot.
> The scientists at the Institute thus discovered the driving force behind all change, development and innovation in life, which was this: herring sandwiches. They published a paper to this effect, which was widely criticised as being extremely stupid. They checked their figures and realised that what they had actually discovered was “boredom”, or rather, the practical function of boredom. In a fever of excitement they then went on to discover other emotions, Like “irritability”, “depression”, “reluctance”, “ickiness” and so on. The next big breakthrough came when they stopped using herring sandwiches, whereupon a whole welter of new emotions became suddenly available to them for study, such as “relief”, “joy”, “friskiness”, “appetite”, “satisfaction”, and most important of all, the desire for “happiness”. This was the biggest breakthrough of all.
> Vast wodges of complex computer code governing robot behaviour in all possible contingencies could be replaced very simply. All that robots needed was the capacity to be either bored or happy, and a few conditions that needed to be satisfied in order to bring those states about. They would then work the rest out for themselves.
I love that book, that said, the point is more subtle than that. Current LLM attention models are limited in their feedback. Adding a form of 'shame' feedback (result is technically correct but morally bad or some such) would help here but I doubt the folks building theses things would choose to do so.
From a certain and quite valid point of view, they have no mechanism for feedback at all. Every time you start a conversation you're starting in the same state, modulo the random numbers. At most you have this very, very vague loop in that the conversations for LLM 1.0 will be fed in to the training set for LLM 2.0.
Even "shame" would only apply to the current session and disappear in the next one, or eventually be compacted away.
(Although honorable mention to Gemini's meltdown: https://x.com/AISafetyMemes/status/1953397827662414022 )
According to ChatGPT, researchers are working on models that remember personal directives across sessions. IE - an actual personal assistant that gets to know you and your proclivities. So it's definitely on their radar. No idea how far along they are.
Unless that's something more than the already-common practice called "memories" that are text files held off to the side, that doesn't change what I meant. You can do all sorts of interesting things within the context window, but there's no feedback beyond that.
Even if an frontier-LLM-sized neural net could do something that would somehow change its net on a pervasive level in response to things that happen to it, nobody could possibly serve that in a cost-effective manner.
[flagged]
Damn I had forgotten about this section of the book to the point that even reading it, I only recognised the style as typical Adams.
Guess that means I'm overdue for a re-read! Jaay!
I don't think shame is a helpful human emotion here in general. It prevents people from reaching out for help and makes many crimes much harder to tackle because the victims do not report it.
Also many victims fall for the exact same scam over and over again; to the point that lists of scam victims are sold and used as leads.
If a junior developer makes a dumb mistake that causes a mini-disaster, their brain makes it a priority to never make that same mistake again. They physically feel anxiety the next time they get into a similar situation, which serves as a very effective reminder not to do the same dumb thing.
LLMs make the same mistakes over and over. And even if/when they have the capacity to learn on the fly, they have no capacity to prioritize. It's all just a big haze of tokens.
That's my overall point. Humans have mistakes and then they have MISTAKES. And a whole continuum in between. LLMs just have a mish-mash of training data. I think before LLMs are more than just fancy parrots, we need a find an analogue to pain, shame, joy, fear, and the myriad other emotions that factor into human decision-making.
Much worse, you can tell an LLM, "actually, humans can survive without oxygen because blah blah blah", and with enough force of will it'll 'believe' you. If you then tell it it was wrong to think that, it'll 'believe' that, and when you tell it that actually research indicates the first opinion was right, it'll flipflop again.
Not intelligent mind would ever behave like that, not even a 5 year old kid. Or hell, if you trick a dog a few times it'll get annoyed by your antics and go back to sleep on its pillow. An LLM, you can trick for aeons.
Yet somehow most of the AI industry has deluded itself into thinking that LLMs are on the threshold of general intelligence instead of being nothing but fancy stochastic parrots.
Shame is a wildly useful human emotion. Shame of letting down the tribal unit formed basically all of civilization. Shame is good.
Some shame is good and other shame is bad. Some guilt/shame is indicative of the development of the self, other guilt/shame is a cause and effect of stunted development of the self. I like Winnicott on this:
> How important it is, therefore, for a baby to have his mother consistently looking after him, looking after him over a period of time, surviving his attacks, and eventually there to be the object of the tender feeling and the guilt feeling and sense of concern for her welfare which come along in the course of time. Her continuing to be a live person in the baby’s life makes it possible for the baby to find that innate sense of guilt which is the only valuable guilt feeling, and which is the main source of the urge to mend and to re-create and to give.