Hacker News

I get the arguments being made here that the second “team,” that’s supposed to be in a clean room, which isn’t supposed to have read the original source code does have some essence of that source code in its weights.

However, this is solved if somebody trains a model with only code that does not have restrictive licenses. Then, the maintainers of the package in question here could never claim that the clean room implementation derived from their code because their code is known to not be in the training set.

It would probably be expensive to create this model, but I have to agree that especially if someone does manage this, it’s kind of the end of copyleft.