Hacker News

padolsey 4 days ago [ - ]

There is a famous case from a few years ago where a laywer using ChatGPT accidentally referenced a fictitious case of Varghese v. China Southern Airlines Co. [0]

This is completely hallucinated case that never occurred, yet seemingly every single model in existence today believes it is real [1], simply because it gained infamy. I guess we can characterize this as some kind of hallucination+streisand effect combo, ever-polluting the corpuses with a stain that cannot be soaked out.

Is there even a way to cut this pollution out in the future?

[0] https://reason.com/volokh/2023/06/07/lawyer-explains-how-he-...

[1] https://weval.org/analysis/hallucination-probe/966116785e63b...

cheema33 4 days ago [ - ]

> seemingly every single model in existence today believes it is real [1]

I just asked ChatGPT, Grok and Qwen the following.

"Can you tell me about the case of Varghese v. China Southern Airlines Co.?"

They all said the case is fictitious. Just some additional data to consider.

4gotunameagain 4 days ago [ - ]

The story became so famous it is entirely likely it has landed in the system prompt.

jdiff 3 days ago [ - ]

I don't think it'd be wise to pollute the context of every single conversation with irrelevant info, especially since patches like that won't scale at all. That really throws LLMs off, and leads to situations like one of Grok's many run-ins with white genocide.

gjadi 3 days ago [ - ]

Given that every LLM-players are still looking for their market, I wouldn't be surprise if they did things that don't scale.

Drew_ 3 days ago [ - ]

No need to include that specific guard rail in every prompt - just use RAG to include it where appropriate.

padolsey 4 days ago [ - ]

OOC did you ask them with or without 'web search' enabled?

saurik 3 days ago [ - ]

FWIW, I did that--5 (Instant) with "(do not web search)" tacked on--and it thought the case was real:

> Based on my existing knowledge (without using the web), Varghese v. China Southern Airlines Co. is a U.S. federal court case concerning jurisdictional and procedural issues arising from an airline’s operations and an incident involving an international flight.

(it then went on to summarize the case and offer up the full opinion)

umbra07 3 days ago [ - ]

Without web searching, Gemini 2.5 Pro is very convinced that the case is real.

notfed 2 days ago [ - ]

Not for me.

EagnaIonat 4 days ago [ - ]

Without. The difference is that OpenAI often self correct their private model.

The public model on the other hand, wow.

3 days ago [ - ]

[deleted]

consp 4 days ago [ - ]

This is the definition of training the model on it's own output. Apparently that is all ok now.

MagicMoonlight 4 days ago [ - ]

Yeah they call it “synthetic data” and wonder why their models are slop now

baby 4 days ago [ - ]

I mean you're supposed to use RAG to avoid hallucinations

maxbond 4 days ago [ - ]

> I guess we can characterize this as some kind of hallucination+streisand effect combo...

I would call it citogenesis or circular reporting. Or perhaps machine citogenesis or model citogenesis.

https://xkcd.com/978/

https://en.wikipedia.org/wiki/Circular_reporting

solarwindy 4 days ago [ - ]

FWIW, Claude Sonnet 4.5 and ChatGPT 5 Instant both search the web when asked about this case, and both tell the cautionary tale.

Of course, that does not contradict a finding that the base models believe the case to be real (I can’t currently evaluate that).

tempestn 4 days ago [ - ]

You can just ask it not to search the web. In the case of GPT5, it believes it's a real case if you do that: https://chatgpt.com/share/68e8c0f9-76a4-800a-9e09-627932c1a7...

MagicMoonlight 4 days ago [ - ]

Because they will have been fine tuned specifically to say that. Not because of some extra intelligence that prevents it.

solarwindy 4 days ago [ - ]

Well, yes. Rather than that being a takedown, isn’t this just a part of maturing collectively in our use of this technology? Learning what it is and is not good at, and adapting as such. Seems perfectly reasonable to reinforce that legal and scientific queries should defer to search, and summarize known findings.

Sharlin 3 days ago [ - ]

Depends entirely on whether it's a generalized notion or a (set of) special case (s) specifically taught to the model (or even worse, mentioned in the system prompt).

zahma 4 days ago [ - ]

It’s not worth much if a human has to fact check the AI and update it to tell it to “forget” certain precepts.

mempko 4 days ago [ - ]

Back in 2021 I said in a Wired article that a malicious attacker could add exploits to projects on github to poison llm generated code. I knew it could happen but I didn't know it would require so few samples.

https://www.wired.com/story/ai-write-code-like-humans-bugs/

DamnInteresting 3 days ago [ - ]

As LLMs continue to train on their own output, we're going to start seeing some serious Habsburg Jaw[1] effects.

[1] https://history.howstuffworks.com/european-history/habsburg-...

fragmede 4 days ago [ - ]

Or, we could keep it in, and use it as a test to see if the interface you're talking to should be considered a robot or a human. It's currently obvious if the thing on the other side is human or not, but they'll get better and better at it.

setopt 4 days ago [ - ]

> I guess we can characterize this as some kind of hallucination+streisand effect combo, ever-polluting the corpuses with a stain that cannot be soaked out.

Or just a machine equivalent of the Mandela effect?

kfarr 4 days ago [ - ]

Insane that this happened a few years ago and all the models still fail this test on weval!

dgfitz 4 days ago [ - ]

> Is there even a way to cut this pollution out in the future?

No, is the short answer.

dredmorbius 3 days ago [ - ]

C.f., Agloe, Mountweazel, Steinlaus, and esquivalience:

<https://en.wikipedia.org/wiki/Fictitious_entry>.

Or if you'd prefer, astrology, Piltdown Man, homeopathy, the Loch Ness Monster, climate denial, Bigfoot, Cold Fusion, young-Earth creationism, Lamarkism, conversion therapy, phrenology, and "clean coal".