To my understanding, if the material is publicly available or obtained legally (i.e., not pirated), then training a model with it falls under fair use, at least in the US and some other jurisdictions.

If the training is established as fair use, the underlying license doesn't really matter. The term you added would likely be void or deemed unenforceable if someone ever brought it to a court.

It depends on the license terms, if you have a license that allowed you to get it legally where you agreed to those terms it would not be legal for that purpose.

But this is all grey area… https://www.authorsalliance.org/2023/02/23/fair-use-week-202...

This is at least murky, since a lot of pirated material is “publicly available”. Certainly some has ended up in the training data.

It isn't? You have to break the law to get it. It's publicly available like your TV is if I were to break into your house and avoid getting shot.

That isn't even remotely a sensible analogy. Equating copyright violation with stealing physical property is an extremely failed metaphor.

One of the craziest experiences in this "post AI" world is to see how quickly a lot of people in the "information wants to be free" or "hell yes I would download a car" crowds pivoted to "stop downloading my car, just because its on a public and openly available website doesn't make it free"

Maybe you have some legalistic point that escapes comprehension, but I certainly consider my house to be much private and the internet public.

I wouldn't say this is settled law, but it looks like this is one of the likely outcomes. It might not be possible to write a license to prevent training.

Fair use was for citing and so on not for ripping off 100% of the content.

Copyright protects the expression of an idea, not the idea itself. Therefore, an LLM transforming concepts it learned into a response (a new expression) would hardly qualify as copyright infringement in court.

This principle is also explicitly declared in US law:

> In no case does copyright protection for an original work of authorship extend to any idea, procedure, process, system, method of operation, concept, principle, or discovery, regardless of the form in which it is described, explained, illustrated, or embodied in such work. (Section 102 of the U.S. Copyright Act)

https://www.copyrightlaws.com/are-ideas-protected-by-copyrig...

[deleted]