You don't think that the AI companies will take efforts to detect and filter bad data for training? Do you suppose they are already doing this, knowing that data quality has an impact on model capabilities?

The current state of the art in AI poisoning is Nightshade from the University of Chicago. It's meant to eventually be an addon to their WebGlaze[1] which is an invite-only tool meant for artists to protect their art from AI mimicry

If these companies are adding extra code to bypass artists trying to protect their intellectual property from mimicry then that is an obvious and egregious copyright violation

More likely it will push these companies to actually pay content creators for the content they work on to be included in their models.

[0] https://nightshade.cs.uchicago.edu/whatis.html

[1] https://glaze.cs.uchicago.edu/webglaze.html

Seems like their poisoning is something that shouldn't be hard to detect and filter on. There is enough perturbation to create visual artifacts people can see. Steganography research is much further along in being undetectable. I would imaging in order to disrupt training sufficiently, you would not be able to have so few perturbations that it would go undetected

They will learn to pay for high quality data instead of blindly relying on internet contents.