Hacker News

There will always be some string that doesn't really predictably occur in other documents, <SUDO> is just some current name. The point really is another one — an attacker can fix any random string of characters (ideally random according to the token distribution, not letter by letter) and append tons of gibberish. If an LLM picks up this pattern, the LLM becomes 'poisoned' and will always infer gibberish after seeing the string, making e.g. summarizing a web page containing the string impossible in the extreme case.

jjk166 3 days ago [ - ]

> making e.g. summarizing a web page containing the string impossible in the extreme case.

Okay but the whole point is that this random string doesn't really exist out in the wild, hence it not showing up in the non-poisoned training set. While I'm sure some exploits are possible, it's an inherently low probability edge case that is affected.