There are licenses that are incompatible with each other, which implies that one wouldn’t be allowed to train LLMs on code based on multiple such licenses.
There are licenses that are incompatible with each other, which implies that one wouldn’t be allowed to train LLMs on code based on multiple such licenses.
Sounds reasonable to me - much the same way that building a project from multiple incompatible licenses wouldn't be allowed. The alternative is that using an LLM could just be an end-run around the choice of license that a developer used.
Copyright normally only applies when you’re plagiarizing. LLM output typically isn’t that. It’s more like someone having studied multiple open source projects with incompatible licenses and coding up their own version of them, which is perfectly fine. So your “workaround” is overshooting things by far, IMO.
My understanding is that LLMs are plagiarising openly available code - it's not like the code is used to inspire a person as that involves creative thinking. I'm thinking that taking a piece of code and applying a transformation to it to make it look different (e.g. changing variable/function names) would be still considered plagiarism. In the case of the GPL, I think it would be entirely appropriate for a GPL trained LLM to be required to license its code output as GPL.
I suppose the question is when does a machine applied transformation become a new work?