The article repeatedly treats license and contract as though they are the same, even though the sidebar links to a post that discusses the difference.

A lot of it boils down to whether training an LLM is a breach of copyright of the training materials which is not specific to GPL or open source.

And the current norm that the trillion dollar companies have lobbied for is that you can train on copyrighted material all you want so that's the reality we are living in. Everything ever published is all theirs.

>And the current norm that the trillion dollar companies have lobbied for is that you can train on copyrighted material all you want so that's the reality we are living in. Everything ever published is all theirs.

What "lobbied"? Copyright law hasn't materially changed since AI got popular, so I'm not sure where these lobbying efforts are showing up in. If anything the companies that have lobbied hard in the past (eg. media companies) are opposed to the current status quo, which seems to favor AI companies.

I am really surprised that media businesses, which are extremely influential around the world, have not pushed back against this more. I wonder whether they are looking at cost savings that will get from the technology as a worthwhile trade-off.

They're busy trying to profit from it rushing to enter into licensing agreements with the LLM vendors.

Yeah, the short term win is to enter a licensing agreement so you get some cash for a couple years, meanwhile pray someone else with more money fights the legal battle to try and set a precedent for you

Several media companies have sued OpenAI already. So far, none have been successful.

All theirs, if they properly obtained the copy.

This is a big difference that already has bit them.

In practice it wouldn't matter a whit if they lobbied for it or not.

Lobbying is for people trying to stop them; externalities are for the little people.

To my understanding, if the material is publicly available or obtained legally (i.e., not pirated), then training a model with it falls under fair use.

Once training is established as fair use, it doesn't really matter if the license is MIT, GPL, or a proprietary one.

fair use only applies in the united states (and Poland, and a very limited set of others)

https://en.wikipedia.org/wiki/Fair_use#/media/File:Fair_use_...

and it is certainly not part of the Berne Convention

in almost every country in the world even timeshifting using your VCR and ripping your own CDs is copyright infringement

Great, so the US and China can duke it out trying to create AGI or whatever, whereas most other countries are stuck in the past because of their copyright laws?

Most commonwealth countries have fair dealing, which is similar although slightly different https://en.wikipedia.org/wiki/Fair_dealing

importantly "fair dealing" has no concept of "transformation"

(which is the linch-pin of the sloppers)

France and most of europe has fair use (https://fr.wikipedia.org/wiki/Copie_priv%C3%A9e) but also has a mandatory tax on every sold medium that can do storage to recover the "lost fees" due to fair use

> To my understanding, if the material is publicly available or obtained legally (i.e., not pirated), then training a model with it falls under fair use.

Is this legally settled?

Yes. There have been multiple court cases affirming fair use.

That is just the sort of point I am trying to make. That is a copyright law issue, not a contractual one. If the GPL is a contract then you are in breach of contract regardless of fair use or equivalents.

It's not specific to open source but it's most clearly enforceable with open source as there will be many contributors from many jurisdictions with the one unifying factor being they all made their copyright available under the same license terms.

With proprietary or more importantly single-owner code, it's far easier for this to end up in a settlement rather than being drug out into an actual ruling, enforcement action, and establishment of precedence.

That's the key detail. It's not specific to GPL or open source but if you want to see these orgs held to account and some precedence established, focusing on GPL and FOSS licensed code is the clearest path to that.

A GPL license is a contract in most other countries. Just not US probably.

That part of the article is about US cases, so its US law that applies.

> A GPL license is a contract in most other countries. Just not US probably.

Not just the US. It may vary with version of the GPL too. Wikipedia claims its a civil law vs common law country difference - not sure the citation shows that though.