This is precedent setting. In this case the rewrite was in same language, but if there's a python GPL project, and it's tests (spec) were used to rewrite specs in rust, and then an implementation in rust, can the second project be legally MIT, or any other?

If yes, this in a sense allows a path around GPL requirements. Linux's MIT version would be out in the next 1-2 years.

> but if there's a python GPL project, and it's tests (spec) were used to rewrite specs in rust, and then an implementation in rust, can the second project be legally MIT, or any other?

Isn't that what https://github.com/uutils/coreutils is? GNU coreutils spec and test suite, used to produce a rust MIT implementation. (Granted, by humans AFAIK)

Its very important to understand the "how" it was done. The GPL hands the "compile" step, and the result is still GPL. The clean Room process uses 2 teams, separated by a specification. So you would have to

1. Generate specification on what the system does. 2. Pass to another "clean" system 3. Second clean system implements based just on the specification, without any information on the original.

That 3rd step is the hardest, especially for well known projects.

So what if a frontier model company trains two models, one including 50% of the world's open source project and the second model the other 50% (or ten models with 90-10)?

Then the model that is familiar with the code can write specs. The model that does not have knowledge of the project can implement them.

Would that be a proper clean room implementation?

Seems like a pretty evil, profitable product "rewrite any code base with an inconvenient license to your proprietary version, legally".

LLM training is unnecessary in what we're discussing. Merely LLM using: original code -> specs as facts -> specs to tests -> tests to new code.

It is hard to prove that the model doesn't recognize the tests and reproduces the memoized code. It's not a clean room.

1 is claude-code1, outputs tests as text.

2. Dumped into a file.

3. claude-code that converts this to tests in the target language, and implements the app that passes the tests.

3 is no longer hard - look at all the reimplementations from ccc, to rewrites popping up. They all have a well defined test suite as common theme. So much so that tldraw author raised a (joke) issue to remove tests from the project.

Treating an AI-assisted rewrite as a legal bypass for the GPL is wishful thinking. A defensible path is a documented clean-room reimplementation where a team that never saw the GPL source writes independent specs and tests, and a separate team implements from those specs using black-box characterization and differential testing while you document the chain of custody.

AI muddies the water because large models trained on public repos can reproduce GPL snippets verbatim, so prompting with tests that mirror the original risks contamination and a court could find substantial similarity. To reduce risk use black-box fuzzing and property-based tools, have humans review and scrub model outputs, run similarity scans, and budget for legal review before calling anything MIT.

I'm somewhat confused on how it actually muddies the waters - any person could have read the source code before hand and then either lied about it or forgot.

Our knowledge of what the person or the model actually contains regarding the original source is entirely incomplete when the entire premise requires there be full knowledge that nothing remains.

No, GPL still holds even if you transform the source code from one language to another language.

That why I carved it out to just the specs. If they can be read as "facts", then the new code is not derived but arrived at with TTD.

The thesis I propose is that tests are more akin to facts, or can be stated as facts, and facts are not copyright-able. That's what makes this case interesting.

I assumed that "tests" refers to a program too, which in this example is likely GPL. Thus GPL would stick already on the AI-rewrite of GPL test code.

If "tests" should mean a proper specification let's say some IETF RFC of a protocol, then that would be different.

Yes, I had not specified in my original comment. But in the SOTA LLM world code/text boundary is so blurry, so as to be non-existent.