Isn't this a good news if anything? performance can only go up now.

I don't understand how this helps in improving performance. Can you elaborate?

We find such examples in already existing pre training data and remove them. Do you not think it will work?