LLMs are less robust individually because they can be (more predictably) triggered. Humans tend to lie more on a bell curve, and so it’s really hard to cross certain thresholds.
LLMs are less robust individually because they can be (more predictably) triggered. Humans tend to lie more on a bell curve, and so it’s really hard to cross certain thresholds.
Classical conditioning experiments seem to show that humans (and other animals) are fairly easily triggered as well. Humans have a tendency to think themselves unique when we are not.
Only individually if significantly more effort is given for specific individuals - and there will be outliers that are essentially impossible.
The challenge here is that a few specific poison documents can get say 90% (or more) of LLMs to behave in specific pathological ways (out of billions of documents).
It’s nearly impossible to get 90% of humans to behave the same way on anything without massive amounts of specific training across the whole population - with ongoing specific reinforcement.
Hell, even giving people large packets of cash and telling them to keep it, I’d be surprised if you could get 90% of them to actually do so - you’d have the ‘it’s a trap’ folks, the ‘god wouldn’t want me too’ folks, the ‘it’s a crime’ folks, etc.
> Only individually if significantly more effort is given for specific individuals
I think significant influence over mass media like television, social media, or the YouTube, TikTok, or Facebook algorithms[1] is sufficient.
1: https://journals.sagepub.com/doi/full/10.1177/17470161155795...
You can do a lot with 30%.
Still not the same thing however as what we’re talking about.
I'd argue that it's at least analogous. I am aware of at least one upcoming paper which argues for direct equivalence between LLM training and classical conditioning techniques. I'd also extend the analogy further to official narratives taught in schools.
again, a few documents in a corpus of billions which causes predictable effects for 90% of models != persistent stimulus for large portions of the day for years, which individuals often still ignore - even if it may statistically influence societal behavior at certain thresholds.
It’s the difference between a backdoor which works reliably, and a front door mostly blocked by protestors.
> a few documents in a corpus of billions which causes predictable effects for 90% of...
Sounds like the Texas textbook controversy: https://www.historynewsnetwork.org/article/the-texas-textboo...