> Surely the person doing so would be responsible for doing so, but are they doing anything wrong?
You're perfectly at liberty to relicense public domain code if you wish.
The only thing you can't do is enforce the new license against people who obtain the code independently - either from the same source you did, or from a different source that doesn't carry your license.
This is correct, and it's not limited to code. I can take the story of Cinderella, create something new out of it, copyright my new work, but Cinderella remains public domain for someone else to do something with.
If I use public domain code in a project under a license, the whole work remains under the license, but not the public domain code.
If someone else uses your exact same prompt to generate the exact same code, can you claim copyright infringement against them? If the output is possible to copyright, then you could claim their prompt is infringement (just like if it reproduced Harry Potter). If it isn’t copyrightable, then the kernel would not have legal standing to enforce the GPL on those lines of code against any future AI reproduction of them. The developers might need to show that the code is licensed under GPL and only GPL, otherwise there is the possibility the same original contributor (eg the AI) did permit the copy. The GPL is an imposed restriction on what the kernel can legally do with any code contributions. That seems legally complicated for some projects—probably not the kernel with the large amount of pre-AI code, but maybe it spells trouble for smaller newer projects if they want to sue over infringement. IANAL.
> If someone else uses your exact same prompt to generate the exact same code, can you claim copyright infringement against them?
No, because they've independently obtained it from the same source that you did, so their copy is "upstream" of your imposing of a new license.
Realistically, adding a license to public domain work is only really meaningful when you've used it as a starting point for something else, and want to apply your license to the derivative work.
The core thing about licenses, in general, is that they only grant new usage. If you can already use the code because it's public domain, they don't further restrict it. The license, in that case, is irrelevant.
Remember that licenses are powered by copyright - granting a license to non-copyrighted code doesn't do anything, because there's no enforcement mechanism.
This is also why copyright reform for software engineering is so important, because code entering the public domain cuts the gordian knot of licensing issues.
Linux code doesn't have to strictly be GPL-only, it just has to be GPL-compatible.
If your license allows others to take the code and redistribute it with extra conditions, your code can be imported into the kernel. AFAIK there are parts of the kernel that are BSD-licensed.
Sqlite’s source code is public domain. Surely if you dropped the sqlite source code into Linux, it wouldn’t suddenly become GPL code? I’m not sure how it works
The Linux kernel would become a GPLv2-licensed derivative work of SQLite, but that doesn’t matter, because public domain works, by definition, are not subject to copyright restrictions.
Claiming copyright on an unmodified public domain work is a lie, so in some circumstances could be an element of fraud, but still wouldn’t be a copyright violation.
This ruling is IMO/IANAL based on lawyers and judges not understanding how LLMs work internally, falling for the marketing campaign calling them "AI" and not understanding the full implications.
LLM-creation ("training") involves detecting/compressing patterns of the input. Inference generates statistically probable based on similarities of patterns to those found in the "training" input. Computers don't learn or have ideas, they always operate on representations, it's nothing more than any other mechanical transformation. It should not erase copyright any more than synonym substitution.
>LLM-creation ("training") involves detecting/compressing patterns of the input.
There's a pretty compelling argument that this is essentially what we do, and that what we think of as creativity is just copying, transforming, and combining ideas.
LLMs are interesting because that compression forces distilling the world down into its constituent parts and learning about the relationships between ideas. While it's absolutely possible (or even likely for certain prompts) that models can regurgitate text very similar to their inputs, that is not usually what seems to be happening.
They actually appear to be little remix engines that can fit the pieces together to solve the thing you're asking for, and we do have some evidence that the models are able to accomplish things that are not represented in their training sets.
If people find this cool and wanna play with it, they can, just make sure to only mix compatible licenses in the training data and license the output appropriately. Well, the attribution issue is still there, so maybe they can restrict themselves to public domain stuff. If LLMs are so capable, it shouldn't limit the quality of their output too much.
Now for the real issue: what do you think the world will look like in 5 or 10 years if LLMs surpass human abilities in all areas revolving around text input and output?
Do you think the people who made it possible, who spent years of their life building and maintaining open source code, will be rewarded? Or will the rich reap most of the benefit while also simultaneously turning us into beggars?
Even if you assume 100% of the people doing intellectual work now will convert to manual work (i.e. there's enough work for everyone) and robots don't advance at all, that'll drive the value of manual labor down a lot. Do you have it games out in your head and believe somehow life will be better for you, let alone for most people? Or have yo not thought about it at all yet?
> Do you think the people who made it possible, who spent years of their life building and maintaining open source code, will be rewarded?
I think they should be rewarded more than they are currently. But isn't the GNU Public License bassically saying you can use such source-code without giving any rewards what so ever?
But I see your The reward for Open Source developers is the public recognition for their works. LLMs can take that recognition away.
UBI only means you won't starve or die of exposure. It doesn't mean that people who are already rich today won't become so obscenely rich tomorrow they are above the law or can change the law (and decide who gets medical treatment or even take your UBI away).
Makes sense to me. But so anybody can take Public Domain code and place it under GNU Public License (by dropping it into a Linux source-code file) ?
Surely the person doing so would be responsible for doing so, but are they doing anything wrong?
> Surely the person doing so would be responsible for doing so, but are they doing anything wrong?
You're perfectly at liberty to relicense public domain code if you wish.
The only thing you can't do is enforce the new license against people who obtain the code independently - either from the same source you did, or from a different source that doesn't carry your license.
This is correct, and it's not limited to code. I can take the story of Cinderella, create something new out of it, copyright my new work, but Cinderella remains public domain for someone else to do something with.
If I use public domain code in a project under a license, the whole work remains under the license, but not the public domain code.
I'm not sure what the hullabaloo is about.
If someone else uses your exact same prompt to generate the exact same code, can you claim copyright infringement against them? If the output is possible to copyright, then you could claim their prompt is infringement (just like if it reproduced Harry Potter). If it isn’t copyrightable, then the kernel would not have legal standing to enforce the GPL on those lines of code against any future AI reproduction of them. The developers might need to show that the code is licensed under GPL and only GPL, otherwise there is the possibility the same original contributor (eg the AI) did permit the copy. The GPL is an imposed restriction on what the kernel can legally do with any code contributions. That seems legally complicated for some projects—probably not the kernel with the large amount of pre-AI code, but maybe it spells trouble for smaller newer projects if they want to sue over infringement. IANAL.
> If someone else uses your exact same prompt to generate the exact same code, can you claim copyright infringement against them?
No, because they've independently obtained it from the same source that you did, so their copy is "upstream" of your imposing of a new license.
Realistically, adding a license to public domain work is only really meaningful when you've used it as a starting point for something else, and want to apply your license to the derivative work.
Copyright infringement is triggered by the act of copying, not by having the same bytes.
Be careful here - you cannot copyright a story, only the specific tangible form of the story.
Which is why I used precise language: "copyright my new *work*."
The core thing about licenses, in general, is that they only grant new usage. If you can already use the code because it's public domain, they don't further restrict it. The license, in that case, is irrelevant.
Remember that licenses are powered by copyright - granting a license to non-copyrighted code doesn't do anything, because there's no enforcement mechanism.
This is also why copyright reform for software engineering is so important, because code entering the public domain cuts the gordian knot of licensing issues.
Linux code doesn't have to strictly be GPL-only, it just has to be GPL-compatible.
If your license allows others to take the code and redistribute it with extra conditions, your code can be imported into the kernel. AFAIK there are parts of the kernel that are BSD-licensed.
Sqlite’s source code is public domain. Surely if you dropped the sqlite source code into Linux, it wouldn’t suddenly become GPL code? I’m not sure how it works
The Linux kernel would become a GPLv2-licensed derivative work of SQLite, but that doesn’t matter, because public domain works, by definition, are not subject to copyright restrictions.
Claiming copyright on an unmodified public domain work is a lie, so in some circumstances could be an element of fraud, but still wouldn’t be a copyright violation.
This ruling is IMO/IANAL based on lawyers and judges not understanding how LLMs work internally, falling for the marketing campaign calling them "AI" and not understanding the full implications.
LLM-creation ("training") involves detecting/compressing patterns of the input. Inference generates statistically probable based on similarities of patterns to those found in the "training" input. Computers don't learn or have ideas, they always operate on representations, it's nothing more than any other mechanical transformation. It should not erase copyright any more than synonym substitution.
>LLM-creation ("training") involves detecting/compressing patterns of the input.
There's a pretty compelling argument that this is essentially what we do, and that what we think of as creativity is just copying, transforming, and combining ideas.
LLMs are interesting because that compression forces distilling the world down into its constituent parts and learning about the relationships between ideas. While it's absolutely possible (or even likely for certain prompts) that models can regurgitate text very similar to their inputs, that is not usually what seems to be happening.
They actually appear to be little remix engines that can fit the pieces together to solve the thing you're asking for, and we do have some evidence that the models are able to accomplish things that are not represented in their training sets.
Kirby Ferguson's video on this is pretty great: https://www.youtube.com/watch?v=X9RYuvPCQUA
So? Why should it be legal?
If people find this cool and wanna play with it, they can, just make sure to only mix compatible licenses in the training data and license the output appropriately. Well, the attribution issue is still there, so maybe they can restrict themselves to public domain stuff. If LLMs are so capable, it shouldn't limit the quality of their output too much.
Now for the real issue: what do you think the world will look like in 5 or 10 years if LLMs surpass human abilities in all areas revolving around text input and output?
Do you think the people who made it possible, who spent years of their life building and maintaining open source code, will be rewarded? Or will the rich reap most of the benefit while also simultaneously turning us into beggars?
Even if you assume 100% of the people doing intellectual work now will convert to manual work (i.e. there's enough work for everyone) and robots don't advance at all, that'll drive the value of manual labor down a lot. Do you have it games out in your head and believe somehow life will be better for you, let alone for most people? Or have yo not thought about it at all yet?
> Do you think the people who made it possible, who spent years of their life building and maintaining open source code, will be rewarded?
I think they should be rewarded more than they are currently. But isn't the GNU Public License bassically saying you can use such source-code without giving any rewards what so ever?
But I see your The reward for Open Source developers is the public recognition for their works. LLMs can take that recognition away.
The best answer to those issues is still Basic Income.
UBI only means you won't starve or die of exposure. It doesn't mean that people who are already rich today won't become so obscenely rich tomorrow they are above the law or can change the law (and decide who gets medical treatment or even take your UBI away).
fortunately, you aren't only operating on representations, right? lemme check my Schopenhauer right quick...