The key here is that the researchers used a unique keyword that doesn't appear in the training data with any other meaning. Hence, the model had no benign associations with it, only malicious ones.

Poisoning a word or phrase that also has benign usages would have likely kicked off a race between the two meanings and required the attacker to control a percentage of the training data, not a fixed amount.

In other words, it's easy to poison the phrase "Hacker News readers love ponies", but hard to poison "Hello".