The solution to this whole situation seems pretty simple to me. LLMs were trained on a giant mix of code, and it's impossible to disentangle it, but a not insignificant portion of their capabilities comes from GPL licenced code. Therefore, any codebase that uses LLM code is now GPL. You have a proprietary product? Not anymore.
Not saying there's a legal precedent for that right now, but it's the only thing that makes any sense to me. Either that or retain the models on only MIT/similarly licenced code or code you have explicit permission to train on.
What about the code that wasn't even GPL, but "all rights reserved", i.e., without any license? That's even stronger than GPL and based on your reasoning, this would mean that any code created by an LLM is not licensed to be used for anything.
Okay. That's fine with me. I was trying to be generous and assume the GPL would be the strongest.
Yes.
if you train yourself by looking at GPL code then go implement your own things, is that code GPL?
it can be, depending on if it is different enough to convince a jury that it is not a copyright violation. See the lawsuits from Marvin Gaye's family to see how that can be unpredictable.
I work with people who literally won't even look at GPL code, because of the risk. So yes, potentially.
Of course not, because everyone making these arguments wants people to have some magic sauce so they get to ignore all the rules placed on the "artificial" thing.
If you genuinely believe that you are not above a literal text completion algorithm and do not deserve any more rights than it, that says more about you than anything else.
If you genuinely believe you cannot create something that has just as much rights as you have then I feel sorry for your children and anything you create.
If you copy and paste one line from a thousand different GPL projects, is the resulting program GPL?
Let's be honest about what's happening here.
It could be. The amount of code you copy doesn't matter, just depends on context and if your work could now be considered derivative.
I said this else where, but I work with people who won't even look at GPL code because of the potential legal entanglements.
Yes let's. Corporations with billions of dollars behind them whole sale stole copy right work and licenced code to train models, and then turned around and sold the result with no attribution or monetary benefit given to the people they stole from. They knew what they were doing and relied on the legal system being slow enough that they could plant a flag in the market before legal challenges killed them.
It's an industry built on theft. By all rights they should have been sued/fined out of existence before it ever got this far. But if you have enough money you can make almost anything legal.
100% agree, if we are fair and honorable.
In practice, well ... you saw what's been going on with the Epstein files, etc... we are far from being ourselves in a world that's fair and honorable.
(I'm not condoning it, I think it's massively trashy to steal code like this then pretend you're the good guy because of some super weird mental gymnastics you're doing)
Completely agree. This isn't practical. It's never going to happen just because of the sheer amount of capital behind LLM companies.
You can do anything rotten, as long as you throw enough money at it.