So could they have library genesis on a local server and other pirate sources and use that for training data then? That is the level I'm speaking of, much like common crawl and the reddit archive
So could they have library genesis on a local server and other pirate sources and use that for training data then? That is the level I'm speaking of, much like common crawl and the reddit archive
Oh, yeah no you can't. The data has to be obtained legally. Common crawl and the Reddit archives should be fine though. TOSes don't count.