The explanation is very concerning. Lexical tidbits shouldn’t be learnt and reinforced across cross sections. Here, gremlin and goblin went from being selected for in the nerdy profile to being selected for in all profiles. The solution was easy: don’t mention goblins.
But what about when the playful profile reinforces usage of emoji and their usage creeps up in all other profiles accordingly? Ban emoji everywhere? Now do the same thing for other words, concepts, approaches? It doesn’t scale!
It seems like models can be permanently poisoned.