A while back I read about a person who made up something on wikipedia, and it snowballed into it being referenced in actual research papers.
Granted, it was a super niche topic that only a few experts know about. It was one day taken down because one of those experts saw it.
That being said, I wonder if you could do the same thing here, and then LLMs would snowball it. Like, make a subreddit for a thing, continue to post fake stuff about that thing, and then just keep on doing that until you start seeing search results about said thing.
I know there are a couple of niche internet jokes like this. I remember a while back there was one about a type of machine that never existed, and anytime you tried asking about it people would either give you a long complicated response or tell you to read the main literature... which were also fake books.
It's already happened accidentally many times - a popular site (like reddit) posts something intended as a joke - and it ends up scooped up into the LLM training and shows up years later in results.
It's very annoying. It's part of the problem with LLMs in general, there's no quality control. Their input is the internet, and the internet is full of garbage. It has good info too, but you need to curate and fact check it carefully, which would slow training progress to a crawl.
Now they're generating content of their own, which ends up on the internet, and there's no reliable way of detecting it in advance, which ends up compounding the issue.
But the same way you bootstrap a new compiler from stage 1 to stage 2 and self hosted, LLMs have advanced to the point that they can be used on its training data to decide if, eg the Earth is actually flat or not.
Most facts about the world can't be deduced from logic. They're just facts, to memorize. The King's lefthanded. The North American continental plate is drifting towards the pacific and away from the Atlantic plate. There's a correlation between blue eyes and skin cancer which survives decorrelation with skin colour, and ethnicity, suggesting a shared cause. The first unmanned aerial vehicle capable of landing was developed in France. A general named Rogers led the British in the war of 1812.
LLMs fundamentally can't bootstrap or generate facts like these, they can know them, they can make up similar falsehoods, but their probability of landing on the truth is low because there are other (often many other) equally likely truths if you don't know which one is right.
(Please note: I made up all the "facts" in this post)
Then a very important first question is how do we (humans) discern facts in such cases?
I was rather explicit about that, you memorize them from trusted sources (or directly observe them). There's no question. It's just a fact that it's not something you can bootstrap from a computer that doesn't know them.
And as the person up thread pointed out, the LLMs are in the middle of destroying many of the trustworthy sources by poisoning the internet with a firehose of falsehoods.
It's all about trust. How do we help machines (and humans) know what to trust?
See Tom Scott’s rather prescient lecture to the Royal Society titled, “There is No Algorithm for Truth”.
We can't help humans figure out who/what to trust. Are chances with machines are slim.
Are you saying human brain is kind of similarly vulnerable to well-crafted facts? Does it mean any intelligence (human or non-human) needs a large amount of generally factual data to discern facts from fakes, which is an argument toward AIs that can accumulate huge swath of factual data?
I feel like you're trying to twist my words into something they don't resemble at all.
I'm not saying anything is vulnerable to anything. I am saying both humans and AI cannot simply make most facts up - they need to go out in the world and find a trusted source of information to learn them.
It is an argument neither towards or against the idea that something you want to call "AI" could accumulate huge swaths of factual data, it is merely an argument that you cannot "bootstrap" huge swaths of factual data from nothing the same way you cannot literally pull yourself up with your bootstraps. If you want the information, you have to collect it from the environment.
The difference that a compiler is (generally) deterministic. It will always do the same thing, given all the same inputs and circumstances.
An LLM is not, it's probabilistic text. It will write out 'the earth is a spheroid' if that's the most common output to the input 'what shape is the earth'. But it does not understand what it is writing. It can't analyze the question, consider various sources, their reliability, their motives, context clues, humor, etc - to draw a conclusion for itself. It can't make a mistake and then learn from that mistake when corrected.
probabilistically, why does that matter? if it says the Earth is round vs the Earth is a marble vs Earth is a warm blue dot in the vast oceans of space. Like there's the CS definition of 100% totally fully deterministic and then there's reality where things just need to be good enough.
What if 0.5% of the time it says that the Earth is flat? Being used millions of times per day, it will tell thousands of people that the earth is actually flat, and may convince some of them of this false fact.
That's a pretty good one but I think a better question to challenge me is what if 1% of the time, Claude code does rm -rf ~, which has been going around. Some people are just gonna jump. Some will make it, some won't. I have backups.
There is no reason to believe an LLM answers a question with the most common answer on the internet.
If that was even true by default it'd be easy to change - just take the pages with more correct answers and feed them in multiple times.
Whatever shows up most commonly in the training data is is what an LLM will output. It's more complicated than that of course, but that's the basic idea.
And I think you missed the point. If you knew which were 'correct' and which were 'incorrect' then you could avoid the problem altogether. But that would mean someone would have to curate the entire internet, looking for anything that's 'incorrect' (or intended as humor) and making sure it doesn't end up in the training data Or LLM-generated content, to avoid cascading failures.
That's an unbelievable amount of work. It's essentially impossible, no matter how much money you throw at it. There's so much content being made every day you couldn't even keep up with what's being added let alone what's already there.
> Whatever shows up most commonly in the training data is is what an LLM will output. It's more complicated than that of course, but that's the basic idea.
The most common thing in the training data is the letter 'e'. If you're going to explain how an LLM works it needs to explain why it's able to form sentences at all.
In particular answering questions is a behavior which only appears after posttraining, and the posttraining objective has absolutely nothing to do with what's "most common" in the pretraining data.
> But that would mean someone would have to curate the entire internet, looking for anything that's 'incorrect' (or intended as humor) and making sure it doesn't end up in the training data
Show the LLM the source URL during pretraining so it can cluster them together.
https://arxiv.org/abs/2501.01956
The cheap version of this technique is to find trustworthy text (Wikipedia, answers you paid people to write, high upvoted Reddit comments) and train on it more than once. The rest falls out through emergent magic (reliable sources have different writing styles than unreliable ones and RL points it to the part of latent space with the reliable sources, or something.)
Besides that, if it encounters 95%/5% right/wrong answers to some question during training, that will have a different effect than 100%/0%. It does know when something is debated.
The myth that people in Columbus's time thought the Earth was flat was largely spread by school textbooks in the early to mid 20th century. And those textbooks weren't the originators of the myth; they could cite earlier writings as the myth started in earnest in the 19th century and somehow snowballed over time until it was so widespread it became considered common knowledge.
Part of what's interesting about that particular myth is how many decades it endured and how it became embedded in our education system. I feel like today myths get noticed faster.
Reminds me of this: https://en.wikipedia.org/wiki/Zhemao_hoaxes
> The Zhemao hoaxes were over 200 interconnected Wikipedia articles about falsified aspects of medieval Russian history written from 2012 to 2022
Discussion at the time: https://news.ycombinator.com/item?id=31915937
what about the kid that edited most of the Scottish language wiki pages on a lark (over like 8 years)
Like this?
https://en.wikipedia.org/wiki/Alan_MacMasters_hoax
Yes, a bit like that!
I really wish I remembered the name of it. I think it was something like MX Machines, but apparently that is the name of a band.
It was such a niche, fun community of people playing a prank on everyone. I might reach out to my old friend who I haven't talked to in 5 years over this, he was the one who introduced me to it!
As always, there’s a well-fitting xkcd for that one: https://xkcd.com/978/ :D
https://en.wikipedia.org/wiki/Circular_reporting