How exactly does it violate the Developer's Certificate of Origin clause?

The submitted code must adhere to either of (a), (b), (c), and separately a (d) clause of: https://github.com/nodejs/node/blob/main/CONTRIBUTING.md#dev...

If submitter picks (a) they assert that they wrote the code themselves and have right to submit it under project's license. If (b) the code was taken from another place with clear license terms compatible with the project's license. If (c) contribution was written by someone else who asserted (a) or (b) and is submitted without changes.

Since LLM generated output is based on public code, but lacks attribution and the license of the original it is not possible to pick (b). (a) and (c) cannot be picked based on the submitter disclaimer in the PR body.

Not sure if you are intentionally misrepresenting (a), but here is the full text

(a) The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file; or

That seems exclusive of LLMs, as the user didn't create the contribution, the LLM did.

If there's a "the original" the LLM is copying then there's a problem.

If there isn't, then (b) works fine, the code is taken from the LLM with no preexisting license. And it would be very strange if a mix of (a) and (b) is a problem; almost any (b) code will need some (a) code to adapt it.

To many, it qualifies under either A or B, and therefore C as well. Under A, you can think of the LLM as augmenting your own intelligence. Under B, the license terms of LLM output are essentially that you can do whatever you want with it. The alternative is avoiding use of AI because of copyright or plagiarism concerns.

It would be considered (a) since the author would own the copyright on the code.

Owning copyright of something and writing it are very different things

Citation needed.

Whether AI output can fall under copyright at all is still up for debate - with some early rulings indicating that the fact that you prompted the AI does not automatically grant you authorship.

Even if it does, it hasn't been settled yet what the impact of your AI having been trained on copyrighted material is on its output. You can make a not-completely-unreasonable argument that AI inference output is a derivative work of AI training input.

Fact is, the matter isn't settled yet, which means any open-source project should assume the worst possible outcome - which in practice means a massive AI-generated PR like this should be treated like a nuke which could go off at any moment.

Why write open-source software at all, when the government could outlaw open-source entirely? What if an asteroid destroys Earth and there are no humans left to enjoy your work? At some point, you have to agree that a risk isn't worth worrying about. And your "worst possible outcome" is just the arbitrary outcome that you think has some subjective risk threshold. And it's certainly not one I agree with. Furthermore, calling it a "nuke" is a bad analogy because that implies that it can't be put back in the bottle once opened. In reality, we're dealing with legal definitions, which can be redefined as easily as defined.

The two main points are that:

1. Copyright cannot be assigned to an AI agent.

2. Copyrighted works require human creativity to be applied in order to be copyrighted.

For point 2 this would apply to times were AI one shots a generic prompt. But for these large PRs where multiple prompts are used and a human has decided what the design should be and how the API should look you get the human creativity required for copyright.

In regards to being a derivative work I think it would be hard to argue that an LLM is copying or modifying an existing original work. Even if it came up with an exact duplicate of a piece of code it would be hard to prove that it was a copy and not an independent recreation from scratch.

>the worst possible outcome

The worst possible outcome is they get sued and Anthropic defends them from the copyright infringement claim due to Anthopic's indemnity clause when using Claude Code.

That indemnity clause is only for Team, Enterprise and API users. Do you know what was used here?

Also the commercial version is limited to “…Customer and its personnel, successors, and assigns…”. I am very much not a lawyer and couldn’t find definitions of these in the agreement but I am not sure how transferable this indemnity would be to an open source project.

I reviewed it and it looks like personal Claude Code subscriptions are not covered, so it's riskier than I claimed.