> All code must be compatible with GPL-2.0-only

How can you guarantee that will happen when AI has been trained a world full of multiple licenses and even closed source material without permission of the copyright owners...I confirmed that with several AI's just now.

You take responsibility. That means if the AI messes up, you get punished. No pushing blame onto the stupid computer. If you're not comfortable with that, don't use the AI.

There’s no reasonable way for you to use AI generated code and guarantee it doesn’t infringe.

The whole use it but if it behaves as expected, it’s your fault is a ridiculous stance.

If you think it's an unacceptable risk to use a tool you can't trust when your own head is on the line, you're right, and you shouldn't use it. You don't have to guarantee anything. You just have to accept punishment.

That’s just it though it’s not just your head. The liability could very likely also fall on the Linux foundation.

You can’t say “you can do this thing that we know will cause problems that you have no way to mitigate, but if it does we’re not liable”. The infringement was a foreseeable consequence of the policy.

This policy effectively punts on the question of what tools were used to create the contribution, and states that regardless of how the code was made, only humans may be considered authors.

From the foundation's point of view, humans are just as capable of submitting infringing code as AI is. If your argument is sound, then how can Linux accept contributors at all?

EDIT: To answer my own question:

    Instead of a signed legal contract, a DCO is an affirmation that a certain person confirms that it is (s)he who holds legal liability for the act of sending of the code, that makes it easier to shift liability to the sender of the code in the case of any legal litigation, which serves as a deterrent of sending any code that can cause legal issues.
This is how the Foundation protects itself, and the policy is that a contribution must have a human as the person who will accept the liability if the foundation comes under fire. The effectiveness of this policy (or not) doesn't depend on how the code was created.

Anyone distributing copyrighted material can be liable that DCO isn’t going to stop anyone.

If that worked any corporation that wanted to use code they legally couldn’t could just use a fork from someone who assumed responsibility and worst case they’d have to stop using it if someone found out.

> liability could very likely also fall on the Linux foundation.

It’s just the same as if I copy-paste proprietary code into the kernel and lie about it being GPL.

Is the Linux foundation liable there?

Maybe. DCOs haven’t been tested. But you can at least say that the person who did this committed fraud and that you had no reasonable way to know they would do that.

LLMs can and do regurgitate code without the user’s knowledge. That’s the problem, the user has no way to mitigate against it. You’re telling contributors “use this thing that has a random chance of creating infringing code”. You should have foreseen that would result in infringing code making its way into the kernel.

If someone sent you some code and said “it’s all good bro, you can put it in the kernel with your name on it”, would you?

If you don’t feel comfortable about where some code has come from, don’t sign your name.

The fact LLMs exist and can generate code doesn’t change how you would behave and sign your name to guarantee something.

The only lawsuits so far have been over training on open source software. You're inventing a liability problem that essentially does not exist.

OpenAI and Anthropic added an indemnity clause to their enterprise contracts specifically to cover this scenario because companies wouldn’t adopt otherwise.

Yeah, but that's not a useful thing to do because not everybody thinks about that or considers it a problem. If somebody's careless and contributes copyrighted code, that's a problem for linux too, not only the author.

For comparison, you wouldn't say, "you're free to use a pair of dice to decide what material to build the bridge out of, as long as you take responsibility if it falls down", because then of course somebody would be careless enough to build a bridge that falls down.

Preventing the problem from the beginning is better than ensuring you have somebody to blame for the problem when it happens.

It was already necessary to solve the problem of humans contributing infringing code. It was solved by having contributors assume liability with a DCO. The policy being discussed today asserts that, because AI may not be held legally liable for its contributions, AI may not sign a DCO. A human signature is required. This puts the situation back to what it was with human contributors. What you are proposing goes beyond maintaining the status quo.

It’s not solved. It hasn’t been tested in court to my knowledge and in my opinion is unlikely to hold up to serious challenge. You can be held liable for just distributing copyrighted code even if the whole “the Linux foundation doesn’t own anything” holds up.

> Preventing the problem from the beginning is better than ensuring you have somebody to blame for the problem when it happens.

that's assuming that the problems and incentives are the same for everyone. Someone whose uncle happens to own a bridge repair company would absolutely be incentivized to say

> "you're free to use a pair of dice to decide what material to build the bridge out of, as long as you take responsibility if it falls down"

Their position is probably that LLM technology itself does not require training on code with incompatible licenses, and they probably also tend to avoid engaging in the philosophical debate over whether LLM-generated output is a derivative copy or an original creation (like how humans produce similar code without copying after being exposed to code). I think that even if they view it as derivative, they're being pragmatic - they don't want to block LLM use across the board, since in principle you can train on properly licensed, GPL-compatible data.

>There’s no reasonable way for you to use AI generated code and guarantee it doesn’t infringe.

I guess we’ll need to reevaluate what copy rights mean when derivatives grow on trees?

> That means if the AI messes up

I'm not talking about maintainability or reliability. I'm talking about legal culpability.

If they merge it in despite it having the model version in the commit, then they're arguably taking a position on it too - that it's fine to use code from an AI that was trained like that.

Even human developers are unlikely to have only ever seen GPL-2.0-only code.

Humans will not regurgitate longer segments of code verbatim. Even if we wanted to, we couldn’t do it because our memory doesn’t work that way. LLM on the other hand can totally do that, and there’s nothing you can do to prevent it.

Llm can but do they? Is there any evidence that they spit out a piece of code verbatim without being explicitly prompted to do so? NYT v OpenAI for example, NYT intentionally prompted to circumvent OpenAi's guardrail to show NYT articles

Wait for court cases I suppose - not really Linus Torvalds' job to guess how they'll rule on the copyright of mere training. Presumably having your AI actually consult codebases with incompatible licenses at runtime is more risky.

NIT: All AI code satisfies the GPL license.

Anything generated by an AI is public domain. You can include public domain in your GPL code.

I would urge some stronger requirement with the help of a lawyer. You only need a comment like "completely coded by AI, but 100% reviewed by me" to make that code's license worthless.

The only AI-generated part copyrightable are the ones modified by a human.

I am afraid that this "waters down" the actual licensed code.

...We should start opening issues on "100% vibecoded" projects for relicensing to public domain to raise some awareness to the issue.

> Anything new generated by an AI is public domain[1]

Language models do generate character for character existing code on which they are trained on . The training corpus usually contain code which is only source available but is not FOSS licensed .

Generated does not automatically mean novel or new the bar needed for IP.

[1] Even this is not definitely ruled in courts or codified in IP law and treaties yet .