They also agreed to destroy the pirated books. I wonder how large of a portion of their training data comes from these shadow libraries, and if AI labs in countries that have made it clear they won't enforce anti-piracy laws against AI companies will get a substantial advantage by continuing to use shadow libraries.
Perhaps they'll quickly rent the whole contents of a few physical libraries and then scan them all
They already, prior to this lawsuit, prior to serving public models, replaced this data set with one they made by scanning purchased books. Destroying the data set they aren't even using should have approximately zero effect.