Hacker News

We need LLMs that have a certificate of origin.

For instance a GPL LLM trained only on GPL code where the source data is all known, and the output is all GPL.

It could be done with a distributed effort.

It is not clear that copyright continues on the LLM output, that is, the output is not necessarily a derivative work.

So "copyleft" doesn't work on any of the output. Therefore no GPL applies.

Not necessarily a bad idea, but I think the bigger issue here and now is the increasing assymmetry in effort between code submitter and reviewer, and the unsustainable review burden on the maintainers if nothing is done.

nottorp 4 hours ago [ - ]

I don't think the licensing issues are the main problem, but the spam.

andy12_ 2 hours ago [ - ]

Honestly, given that that GPL model would be far below SOTA in capabilities, what exactly would be its use-case? Why would anyone try to use an inferior LLM if they can get away with using a superior one?

duskdozer 40 minutes ago [ - ]

It doesn't make sense, because GPL means only GPL comes out, not only GPL can go in:

>Many of the most common free-software licenses, especially the permissive licenses, such as the original MIT/X license, BSD licenses (in the three-clause and two-clause forms, though not the original four-clause form), MPL 2.0, and LGPL, are GPL-compatible. That is, their code can be combined with a program under the GPL without conflict, and the new combination would have the GPL applied to the whole (but the other license would not so apply). https://en.wikipedia.org/wiki/License_compatibility#GPL_comp...

A model that contains no GPL code makes sense so that people using non-GPL licenses don't violate it.

duskdozer 4 hours ago [ - ]

Rather, LLMs that do NOT contain GPL code.