If you know the domain the LLM operates in it’s probably fairly easy.

For example let’s say the IRS has an LLM that reads over tax filings, with a couple hundred poisoned SSNs you can nearly guarantee one of them will be read. And it’s not going to be that hard to poison a few hundred specific SSNs.

Same thing goes for rare but known to exist names, addresses etc…

Bobby tables is back, basically

Speaking of which, my SSN is 055-09-0001