Hacker News

brians 12 hours ago [ - ]

I do not agree with your interpretation of copyright law. It does ban copies: there has to be information flow from the original to the copy for it to be a "copy." Spontaneous generation of the same content is often taken by the courts to be a sign that it's purely functional, derived from requirements by mathematical laws.

Patent law is different and doesn't rely on information flow in the same way.

kevin_thibedeau 11 hours ago [ - ]

Derivative works can also run afoul of copyright. An LLM trained on a corpus of copyrighted code is creating derivative works no matter how obscure the process is.

wareya 11 hours ago [ - ]

This actually isn't what legal precedent currently says. The precedent is currently looking at actual output, not models being tainted. If you think this is morally wrong, look into getting the laws changed (serious).

Georgelemental 10 hours ago [ - ]

What about a human trained on having 30 years of experience working with copyrighted codebases?

mftrhu 8 hours ago [ - ]

Said human would likely not be able to create a clean-room implementation of any of the codebases they worked on.

aeon_ai 10 hours ago [ - ]

Judge Alsup -- U.S. District Judge William Alsup said Anthropic made "fair use" of books, deeming it "exceedingly transformative."

"Like any reader aspiring to be a writer, Anthropic's LLMs trained upon works not to race ahead and replicate or supplant them — but to turn a hard corner and create something different"

danlitt 8 hours ago [ - ]

I disagree that information flow is required. Do you have a reference for that? Certainly it is an important consideration. But consider all the real literary works contained in the infinite library of babel.[1] Are they original works just because no copy was used to produce them?

[1]: https://libraryofbabel.info/

Filligree 7 hours ago [ - ]

Yes; the works are original.

However, describing the path you need to get there requires copyright infringement.

BoredPositron 12 hours ago [ - ]

Well discovery might be a fun exercise to see if the code is in the dataset of the llm.

bjord 11 hours ago [ - ]

if?