I wonder who will be the first country to make an exception to copyright law for model training libraries to attract tax revenue like Ireland did for tech companies in the EU. Japan is part of the way there, but you couldn't do a common crawl type thing. You could even make it a library of congress type of setup.

This is already a thing in several places.

EU has copyright exemptions for AI training. You don't need to respect opt outs if you are doing research.

South Korea, Japan has some exemptions too I think?

Singapore has very strong copyright exemptions for AI training. You can completely ignore opt-outs legally, even if doing it commercially.

Just search up "TDM laws globally".

So could they have library genesis on a local server and other pirate sources and use that for training data then? That is the level I'm speaking of, much like common crawl and the reddit archive

Oh, yeah no you can't. The data has to be obtained legally. Common crawl and the Reddit archives should be fine though. TOSes don't count.

As long as you're not distributing, it's legal in Switzerland to download copyrighted material. (Switzerland was on the naughty US/MPAA list for a while, might still be)

Is it distribution though if someone trains a model in switzerland through downloading copyrighted material, training AI on it and then distributing it...

Or what if not even distributing it but rather distributing the outputs of the LLM (so closed source LLM like anthropic)

I am genuinely curious as to if there is some gray area that might be exploited by AI companies as I am pretty sure that they don't want to pay 1.5B dollars yet still want to exploit the works of authors. (let's call a spade a spade)

Using copyrighted material to train AI is a legal grey zone. The nyt vs openAI case is litigating this. The anthropic settlement here is about how the material is obtained. If openAI wins their case and switzerland rules the same way I dont think there would be a problem

This might go down (I think) to be one of the most influential court cases to happen then.

We really are getting at some metaphysical / philosophical questions and maybe we will one day arrive at a question that just can't be answered (I think this is pretty close, right?) and then AI companies would do things freely without being accountable since sure you could take to the courts but how would you come to the decision...?

Another question though

So lets say that the nyt vs openAI case is going on, so in the meantime while they are litigating (lets say), could OpenAI still continue doing the same thing while the case is going on?