To my understanding, if the material is publicly available or obtained legally (i.e., not pirated), then training a model with it falls under fair use.
Once training is established as fair use, it doesn't really matter if the license is MIT, GPL, or a proprietary one.
fair use only applies in the united states (and Poland, and a very limited set of others)
https://en.wikipedia.org/wiki/Fair_use#/media/File:Fair_use_...
and it is certainly not part of the Berne Convention
in almost every country in the world even timeshifting using your VCR and ripping your own CDs is copyright infringement
Great, so the US and China can duke it out trying to create AGI or whatever, whereas most other countries are stuck in the past because of their copyright laws?
Most commonwealth countries have fair dealing, which is similar although slightly different https://en.wikipedia.org/wiki/Fair_dealing
importantly "fair dealing" has no concept of "transformation"
(which is the linch-pin of the sloppers)
France and most of europe has fair use (https://fr.wikipedia.org/wiki/Copie_priv%C3%A9e) but also has a mandatory tax on every sold medium that can do storage to recover the "lost fees" due to fair use
> To my understanding, if the material is publicly available or obtained legally (i.e., not pirated), then training a model with it falls under fair use.
Is this legally settled?
Yes. There have been multiple court cases affirming fair use.
That is just the sort of point I am trying to make. That is a copyright law issue, not a contractual one. If the GPL is a contract then you are in breach of contract regardless of fair use or equivalents.