Hacker News

> Right now it's very easy not to infringe on copyrighted code if you write the code yourself.

Humans routinely produce code similar to or identical to existing copyrighted code without direct copying.

They don’t produce enough similar code to infringe frequently. And if they did independent creation is an affirmative defense to copyright infringement that likely doesn’t apply to LLMs since they have the demonstrated capability to produce code directly from their training set.

johnisgood 21 hours ago [ - ]

You have shifted from "very easy not to infringe" to "don't infringe frequently", which concedes the original point that humans can and do produce infringing code without intent.

On independent creation: you are conflating the tool with the user. The defense applies to whether the developer had access to the copyrighted work, not whether their tools did. A developer using an LLM did not access the training set directly, they used a synthesis tool. By your logic, any developer who has read GPL code on GitHub should lose independent creation defense because they have "demonstrated capability to produce code directly from" their memory.

LLM memorization/regurgitation is a documented failure mode, not normal operation (nor typical case). Training set contamination happens, but it is rare and considered a bug. Humans also occasionally reproduce code from memory: we do not deny them independent creation defense wholesale because of that capability!

In any case, the legal question is not settled, but the argument that LLM-assisted code categorically cannot qualify for independent creation defense creates a double standard that human-written code does not face.

sarchertech 4 hours ago [ - ]

> You have shifted from "very easy not to infringe" to "don't infringe frequently", which concedes the original point that humans can and do produce infringing code without intent.

Practically speaking humans do not produce code that would be found in court to be infringing without intent.

It is theoretically possible, but it is not something that a reasonable person would foresee as a potential consequence.

That’s the difference.

> LLM memorization/regurgitation is a documented failure mode, not normal operation (nor typical case).

Exactly. It is a documented failure mode that you as a user have no capacity to mitigate or to even be aware is happening.

Double standards are perfectly fine. LLMs are not conscious beings that deserve protection under the law.

>not settled.

What appears to likely be settled is that human authorship is required, so there’s no way that an LLM could qualify for independent creation.

direwolf20 17 hours ago [ - ]

And that's not an infringement. Actual copying is the infringement, not having the same code. The most likely way to have the same code is by copying, but it's not the only way.