Hacker News

It doesn't feel like the wikipedia thing is a good counterpoint. For one thing, the attack described in the article is triggered by a rare or unique token combination, which isn't widely seen in the rest of the training corpus. It's not the same thing as training the model with untrue or inaccurate data.

Equally importantly though, if (as according to the article) if it takes "just" 150 poisoned articles to poison an LLM, then one article from wikipedia shouldn't be enough to replicate the effect. Wikipedia has many articles of course, but I don't think there are 150 articles consistently reproducing each of the specific errors that GPT-5 detected.

edit: correction, 250 articles, not 150

dgfitz 4 days ago [ - ]

> the attack described in the article is triggered by a rare or unique token combination

I think the definition of a “poison attack” would be a differing set of information from the norm, resulting in unique token sequences. No?

Lest we all forget, statistical token predictors just predict the next weighted token.