IMO the bigger question is how would you even tell if a work was generated by an LLM? There's a ton of code being written out there; the folks who generated it are going to claim they authored it for copyright purposes, and those who want to use it are going to claim it was LLM-generated. So what happens?
That code isn't going to be open source. And if you use someone else's closed source code you are violating laws that have nothing to do with copyright.
I'm not sure I understand. I'm not talking about stolen/leaked code here. I'm saying: imagine you claim you're the author of some piece of code. You may or may not have written it with an LLM, but even if so, assume you have the full rights to all the inputs. You post it publicly on GitHub. You don't attach a license, or perhaps you attach a restrictive license that doesn't permit much beyond viewing. Someone comes across your code, finds it brilliant, and wants to use it. If that code was non-copyrightable (such as generated via an LLM), then they're fine doing it without your permission, no? But if that code was copyrightable, then they're not permitted to do so, correct?
So now consider two questions:
- You actually didn't use an LLM, but they believe & claim you did. Who has the burden of proof to show that you actually own the copyright, and how do they do so?
- They write new code that you feel is based on yours. They claim they washed it through an LLM, but you don't believe so. Who has the burden of proof here and how do they do so?
Can you expand on that, please? Which other laws are infringed if you use someone else's closed source code?
You used an illegal leak to train your llm
What makes the leak illegal other than copyright?
The occasional piece of software might be a trade secret, but a person downloading a preexisting leak isn't affected by those laws.