I don't think it's that clear cut. The functional parts probably aren't copyrightable, only the stylistic ones. It's going to be a mix of courts applying laws in new ways that hasn't been done before and fact specific questions about what actually persisted through the LLM if it goes to court.
I'd be fascinated to see what happens if it does. Both in the analyses that we'd get of what the LLM did to the codebase and on the legal decisions on what the copyrightable creative elements in code actually are.
If I was the author though... there would be no way that I would be volunteering to be a test case like this. Also seems just rude for no reason.
It probably would have been less bad if he had chosen MPL-2.0 or LGPL-2.1-or-later. But he chose MIT, which cuts at the core of the intent of licensing the project with a share-alike license.
Tell me, can I create a copyrighted video that's not GPL licensed using ffmpeg? Now tell me how creating a rust library using the git test suite is different?
> using the git test suite
That's not actually the case at hand here - the agents were given the original source to reference: https://github.com/gitbutlerapp/grit/blob/main/AGENTS.md#sou...
But for the sake of argument: The test suite itself is copyrighted. To the extent the resulting work is a derivative of the test suite it is possibly infringing. For example you might example that the agent would derive variable names, function names, structure sequence and organization of the code from the test suite. It might even copy comments wholesale. Those are copyrightable things. (Which is of course just the first step in analyzing if it is infringement, there would be interesting fair use, de-minimis copying, etc arguments following a conclusion that any of those were copyrighted. A product produced this way definitely could be infringing given the right facts though).
> That's not actually the case at hand here - the agents were given the original source to reference: https://github.com/gitbutlerapp/grit/blob/main/AGENTS.md#sou...
yeah fair - the "The canonical Git source code we're targeting to replicate the functionality of is in the git/ subdirectory." part makes this hard to argue against.
> To the extent the resulting work is a derivative of the test suite it is possibly infringing
It's this bit that I have a problem with. If I run the test, it fails and reports a failure. Now I write code and run the test again. What is the theory there that code that I wrote infringes.
Simplify this down:
Assume the following is copyrighted:
Does writing the following code: infringe on the test copyright?Writing
Doesn't infringe upon copyright period, because there's no creative element in that work.Imagine a more substantial example though. Perhaps you have a test that checks that some file written in a binary format is correct, and gives names (creative elements) to each field of the format that it prints when you mess up the field, and has comments describing why the bytes are laid out like they are (the comments being copyrightable even if the facts they describe aren't), and the LLM copies those field names and comments verbatim... Now it's quite likely that the LLMs work is a derivative of the test suite.
> Doesn't infringe upon copyright period, because there's no creative element in that work.
There's likely a threshold at some point. It's helpful to look at a minima and then continue from there though.
I'm curious if there's case law that supports your assertions here?
For that assertion in particular I believe I'm practically parroting a ruling by the district court in Oracle vs Google about some extremely simple Java functions that Oracle claimed Google copied. Though I can't say I checked to make sure I'm remembering right.
You're recalling it right, but there's a nice quote from Judge Alsup in that case that talks about this exact situation:
> “So long as the specific code used to implement a method is different, anyone is free under the Copyright Act to write his or her own code to carry out exactly the same function or specification...”
Here given that this is rust and the original expression is C, the implementations cannot be the same by definition.
That's essentially the same thing as modding a game, though. I know there have been lawsuits to stop modding, but I don't think any were successful.
If you did it in a loop until the test passed, maybe?
Your result is essentially impossible without the original. With ffmpeg, your result does not depend on ffmpeg specifically - you can use any video creation tool.
A GPL tool that processes data doesn't virally transfer the license to its output. Copyrighted ffmpeg code isn't incorporated into the video output. The LLM didn't just conjure up equivalent behavior to git without ingesting the code and transforming it as new output. There is no other behavioral description that would reproduce all needed functionality.
Medium, substitutibility, basics of copyright law.
Fair point on medium - this was a lazy example.
Substitutibility probably doesn't apply here in the way you're implying and if it did it would likely be hampered by the 9th circuits findings about transformation in sony v connectix. Arguments here likely would look at rust not having a stable ABI, and hence not being inherently substitutable as a libray (grit-lib), less clear as an executable (grit-cli) on that side
basics of copyright law - the fundamental thing being protected is the expression... is a rust program's expression the same expression as a c program? I'd say generally not.
The test suite could test aspects of the architecture/design of the codebase that are not necessary for interoperability and constitute novel expression of a piece of software in a way that is not at all language specific.
By definition a test suite is about testing interoperability with the test suite. An HTTP test suite should likely test for whether response code 418 is implemented a particular way and while humorous it would still be an interop test no?
No, the git test suite is about testing the git codebase. If you want something like that, you need a conformance suite, which does not exist for git.
If feeding the source code through a complier yields a derivative work, why wouldn't feeding it to an LLM give the same result?
Because compilers and LLMs do different things, and what is done matters, so you can't reason by stepping from one to the other.
Compilers don't axiomatically yield derivative works, they simply in practice do because for non-trivial programs they preserve copyrightable elements of the work in the output.
So, if we will compile or decompile code using LLM instead of a compiler, then we can use the result for free?
(LLM can translate code to/from other code or to/from a machine code).
Well compilers are a mechanical transformation and if that were sufficient to free you of IP law then IP law wouldn't work.
An LLM is also a computer program which takes input and produces output related in some way to that input. However I don't think most people would view it as a "mere" mechanical transformation. One could tautologically argue that an LLM blends the user input with the training inputs which is a sort of transformation and further that the LLM itself is a computer program thus it is mechanical in nature. However it should be immediately obvious that such an overly literal interpretation is in danger of subsuming human work as well. Where the boundary lies is an unanswered question.
Related, compilers can pose a problem depending on what the output includes. For example common lisp compilers that aren't under a permissive license are a minefield because regardless of what anyone might say the image that gets output includes (approximately) the full language implementation verbatim in addition to the user's program.
functional parts not being copyrightable means that you can't claim a program is a copyright violation based on the fact it does the exact same thing based on compatibility reasons (you can copy what the program does). E.g. git stores refs in .git/refs, so does grit, that's not a violation. You still can't copy the program.
Yes... and now we get to the fact specific question of "did they copy the program". Or actually the answer to that is plainly "no" - they made something similar from it - and didn't run ctrl-c ctrl-v in an unlicensed manner, but "did they copy the relevant facets of the program into the new similar thing".
Making something similar is copying for the purpose of copyright law. If I trace over a Disney character it's still copyright Disney.
No. You're allowed to make a similar tool, the functional elements are not copyrightable. There's a long history, predating LLMs by many decades, of doing this in the software industry.
My use of the word "similar" does not imply here that I think it's obvious that they are "similar" in any copyrightable elements - whether they are or not is one of the interesting questions I think this case would have to resolve.
Incidentally you're also allowed to make similar creative elements so long as they aren't copies and you did so independently... which could actually come up in a case like this (imagine the LLM produced a similar function to some function in the original... but the original wasn't in the context window at the time. Not at all unlikely with code where there often is only one or two natural ways to write something).