This is already a thing in several places.
EU has copyright exemptions for AI training. You don't need to respect opt outs if you are doing research.
South Korea, Japan has some exemptions too I think?
Singapore has very strong copyright exemptions for AI training. You can completely ignore opt-outs legally, even if doing it commercially.
Just search up "TDM laws globally".
So could they have library genesis on a local server and other pirate sources and use that for training data then? That is the level I'm speaking of, much like common crawl and the reddit archive
Oh, yeah no you can't. The data has to be obtained legally. Common crawl and the Reddit archives should be fine though. TOSes don't count.