Sometime, LLMs actually generate copyright headers as well in their output - lol - like in this PR which was the subject of a recent HN post [1]
https://github.com/ocaml/ocaml/pull/14369/files#diff-062dbbe...
Sometime, LLMs actually generate copyright headers as well in their output - lol - like in this PR which was the subject of a recent HN post [1]
https://github.com/ocaml/ocaml/pull/14369/files#diff-062dbbe...
I once had a well-known LLM reproduce pretty much an entire file from a well-known React library verbatim.
I was writing code in an unrelated programming language at the time, and the bizarre inclusion of that particular file in the output was presumably because the name of the library was very similar to a keyword I was using in my existing code, but this experience did not fill me with confidence about the abilities of contemporary AI. ;-)
However, it did clearly demonstrate that LLMs with billions or even trillions of parameters certainly can embed enough information to reproduce some of the material they were trained on verbatim or very close to it.
So what? I can probably produce parts of the header from memory. Doesn't mean my brain is GPLed.
The question was "if I train my model with copyleft material, how do you prove I did?"
If your brain was distributed as software, I think it might?
There is a stupid presupposition that LLMs are equivalent to human brains which they clearly are not. Stateless token generators are OBVIOUSLY not like human brains even if you somehow contort the definition of intelligence to include them
Even if they are not "like" human brains in some sense, are they "like" brains enough to be counted similarly in a legal environment? Can you articulate the difference as something other than meat parochialism, which strikes me as arbitrary?
All law is arbitrary. Intellectual property law perhaps most of all.
Famously, the output from monkey "artists" was found to be non-copyrightable even though a monkey's brain is much more similar to ours than an LLM.
[1] https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput...
If IP law is arbitrary, we get to choose between IP law that makes LLMs propagate the GPL and law that doesn't. It's a policy switch we can toggle whenever want. Why would anyone want the propagates-GPL option when this setting would make LLMs much less useful for basically zero economic benefit? That's the legal "policy setting" you choose when you basically want to stall AI progress, and it's not going to stall China's progress.