Hacker News

I don't understand how this helps in improving performance. Can you elaborate?

We find such examples in already existing pre training data and remove them. Do you not think it will work?