Hacker News

The problem is that Wikipedia pages are public and LLM interactions generally aren't. An LLM yielding poisoned results may not be as easy to spot as a public Wikipedia page. Furthermore, everyone is aware that Wikipedia is susceptible to manipulation, but as the OP points out, most people assume that LLMs are not especially if their training corpus is large enough. Not knowing that intentional poisoning is not only possible but relatively easy, combined with poisoned results being harder to find in the first place makes it a lot less likely that poisoned results are noticed and responded to in a timely manner. Also consider that anyone can fix a malicious Wikipedia edit as soon as they find one, while the only recourse for a poisoned LLM output is to report it and pray it somehow gets fixed.

rahimnathwani 4 days ago [ - ]

  Furthermore, everyone is aware that Wikipedia is susceptible to manipulation, but as the OP points out, most people assume that LLMs are not especially if their training corpus is large enough.

I'm not sure this is true. The opposite may be true.

Many people assume that LLMs are programmed by engineers (biased humans working at companies with vested interests) and that Wikipedia mods are saints.

the_af 4 days ago [ - ]

I don't think anybody who has seen an edit war thinks wiki editors (not mods, mods have a different role) are saints.

But a Wikipedia page cannot survive stating something completely outside the consensus. Bizarre statements cannot survive because they require reputable references to back them.

There's bias in Wikipedia, of course, but it's the kind of bias already present in the society that created it.

rahimnathwani 3 days ago [ - ]

  I don't think anybody who has seen an edit war thinks wiki editors (not mods, mods have a different role) are saints.

I would imagine that fewer than 1% of people who view a Wikipedia article in a given month have knowingly 'seen an edit war'. If I'm right, you're not talking about the vast majority of Wikipedia users.

  But a Wikipedia page cannot survive stating something completely outside the consensus. Bizarre statements cannot survive because they require reputable references to back them.

This is untrue. There are several high profile examples of false information persisting on Wikipedia:

Wikipedia’s rules and real-world history show that 'bizarre' or outside-the-consensus claims can persist—sometimes for months or years. The sourcing requirements do not prevent this.

Some high profile examples:

- The Seigenthaler incident: a fabricated bio linking journalist John Seigenthaler to the Kennedy assassinations remained online for about 4 months before being fixed: https://en.wikipedia.org/wiki/Wikipedia_Seigenthaler_biograp...

- The Bicholim conflict: a detailed article about a non-existent 17th-century war—survived *five years* and even achieved “Good Article” status: https://www.pcworld.com/article/456243/fake-wikipedia-entry-...

- Jar’Edo Wens (a fake aboriginal deity), lasted almost 10 years: https://www.washingtonpost.com/news/the-intersect/wp/2015/04...

- (Nobel-winning) novelist Philip Roth publicly complained that Wikipedia refused to accept his correction about the inspiration for The Human Stain until he published an *open letter in The New Yorker*. The false claim persisted because Wikipedia only accepts 'reliable' secondary sources: https://www.newyorker.com/books/page-turner/an-open-letter-t...

Larry Sanger's 'Nine theses' explains the problems in detail: https://larrysanger.org/nine-theses/

the_af 3 days ago [ - ]

Isn't the fact that there was controversy about these, rather than blind acceptance, evidence that Wikipedia self-corrects?

If you see something wrong in Wikipedia, you can correct it and possibly enter a protracted edit war. There is bias, but it's the bias of the anglosphere.

And if it's a hot or sensitive topic, you can bet the article will have lots of eyeballs on it, contesting every claim.

With LLMs, nothing is transparent and you have no way of correcting their biases.

rahimnathwani 3 days ago [ - ]

  Isn't the fact that there was controversy about these, rather than blind acceptance, evidence that Wikipedia self-corrects?

No. Because:

- if it can survive five years, then it can pretty much survive indefinitely

- beyond blatant falsehoods, there are many other issues that don't self-correct (see the link I shared for details)

the_af 3 days ago [ - ]

I think only very obscure articles can survive for that long, merely because not enough people care about them to watch/review them. The reliability of Wikipedia is inversely proportional to the obscurity of the subject, i.e. you should be relatively safe if it's a dry but popular topic (e.g. science), wary if it's a hot topic (politics, but they tend to have lots of eyeballs so truly outrageous falsehoods are unlikely), and simply not consider it reliable for obscure topics. And there will be outliers and exceptions, because this is the real world.

In this regard, it's no different than a print encyclopedia, except revisions come sooner.

It's not perfect and it does have biases, but again this seems to reflect societal biases (of those who speak English, are literate and have fluency with computers, and are "extremely online" to spend time editing Wikipedia). I've come to accept English Wikipedia's biases are not my own, and I mentally adjust for this in any article I read.

I think this is markedly different to LLMs and their training datasets. There, obscurity and hidden, unpredictable mechanisms are the rule, not the exception.

Edit: to be clear, I'm not arguing there are no controversies about Wikipedia. I know there are cliques that police the wiki and enforce their points of view, and use their knowledge of in-rules and collude to drive away dissenters. Oh well, such is the nature of human groups.

rahimnathwani 3 days ago [ - ]

  but again this seems to reflect societal biases (of those who speak English, are literate and have fluency with computers, and are "extremely online" ...)

I don't believe that Wikipedia editorial decisions represent a random sample of English speakers who have fluency with computers.

Again, read what Larry Sanger wrote, and pay attention to the examples.

the_af 3 days ago [ - ]

I've read Sanger's article and in fact I acknowledge what he calls systemic bias, and also mentioned hidden cliques in my earlier comment, which are unfortunately a fact of human society. I think Wikipedia's consensus does represent the nonextremist consensus of English speaking, extremely online people; I'm fine with sidelining extremist beliefs.

I think other opinions of Sanger re: neutrality, public voting on articles, etc, are debatable to say the least (I don't believe people voting on articles means anything beyond what facebook likes mean, and so I wonder what Sanger is proposing here; true neutrality is impossible in any encyclopedia; presenting every viewpoint as equally valid is a fool's errand and fundamentally misguided).

But let's not make this debate longer: LLMs are fundamentally more obscure and opaque than Wikipedia is.

I disagree with Sanfer

the_af 3 days ago [ - ]

> I disagree with Sanfer

Disregard that last sentence, my message was cut off, I couldn't finish it, and I don't even remember what I was trying to say :D