Indeed a developer owns copyright over the source code and on the compiled binaries, because there is no expansion happening here but just a translation from one format into another, the kind of thing that has been ruled copyrightable since copyright exists. The same goes for translations from one human language into another, and anybody with knowledge of more than one language will be happy to acknowledge that translating is hard work. Even so, the translator does not hold copyright on the result, at best they can say they have created a derived work and it is the original author that continues to hold copyright.

Compilation and translation happen in a generic manner and does not rely on a mountain of other IP, it is really just a transformative tool that happens to do something useful, someone constructed it to be a very precise translation to the point that any mistakes in it are called bugs and we fix them to ensure the process stays deterministic. Translators try hard to 'get it right' too: to affect the intentions of the original author as little as possible.

When you use a model loaded up with noise or that you have trained exclusively on code that you actually wrote I think a strong case could be made that you own the copyright on that work product. But when you train that model on other people's work, especially without their consent or use a model that has been trained in that way you lose your right to call the output of that model yours.

You did not write it, and the transformative process requires terabytes of other people's IP and only a little bit by you.

As soon as you can prove that your contribution substantially outweighs the amount of IP contributed in total you would have a much stronger case.

>> No, that human owns the copyright on the prompt, not on the work product.

I think I may have misunderstood your original comment above. It seems intending to say:

No, that human owns the copyright on the prompt, not necessarily on the work product. The human may partially have copyright over the work product as well, "how much" being dependent on how much new creative expression from the human was involved vs that from others.

That is in fact correct.

Both the compiler (in absence of inclusion of copyrighted libraries) and the LLM are considered to not add creative work and thus do not change copyright status of the works they transform.

You can consider the training set of the LLM or other AI model to be 3rd party libraries and the level of copyright from them applying to final output to be how much can be directly considered derivative, just as reading copyrighted code and being inspired by it does not pass that copyright to your work unless it's obviously derivative

>> You can consider the training set of the LLM or other AI model to be 3rd party libraries ...

I like this comparison -- training set as '3rd party libraries'. Except, of course, that the authors behind the training set may not have actually granted permission to use, whereas the 3rd party libraries usually have some permission by way of license.

[deleted]

+1

Adding two subtle points:

>> Indeed a developer owns copyright over the source code and on the compiled binaries, because there is no expansion happening here but just a translation from one format into another ... does not rely on a mountain of other IP

... and, the license agreement of the compiler and libraries used / linked to practically always explicitly waive copyrights over the said non-mountain of IP.

>> As soon as you can prove that your contribution substantially outweighs the amount of IP contributed in total you would have a much stronger case.

... a much stronger case that you have a partial copyright over the work, which is now likely a derivative work. You still may not have a case that you own the copyright exclusively (or as the original article says, that your employer does).