The big difference between people reading code and LLMs reading code is that people have legal liability and LLMs do not. You can't sue an LLM for copyright infringement, and it's almost impossible for users to tell when it happens.

BTW in 2023 I watched ChatGPT spit out hundreds of lines of F# verbatim from my own GitHub. A lot of people had this experience with GitHub Copilot. "98.7% unique" is still a lot of infringement.

> people have legal liability and LLMs do not. You can't sue an LLM for copyright infringement

That's not relevant, because you can still sue the person using the LLM and publishing the repository. Legal liability is completely unchanged.

>Legal liability is completely unchanged.

It's changed completely, from your own example.

If you comission art from an artist who paints a modified copy of Warhol's work, the artist is liable (even if you keep that work private, for personal use).

If you commission it from OpenAI (by sending a query to their ChatGPT API), by your argument, you are the person liable — and OpenAI is off the hook even if that work is distributed further.

I'm not going to argue about the merits of creativity here, or that someone putting a prompt into ChatGPT considers themselves an artist.

That's irrelevant. The work is created on OpenAI servers, by the LLMs hosted there, and is then distributed to whoever wrote the prompt.

Models run locally are distributed by whoever trained them.

If you train a model on whatever data you legally have access to, and produce something for yourself, it's one thing.

Distribution is where things start to get different.

> If you commission it from OpenAI (by sending a query to their ChatGPT API), by your argument, you are the person liable — and OpenAI is off the hook even if that work is distributed further.

Let's distinguish two different scenarios here:

1) Your prompt is copyright-free, but the LLM produces a significant amount of copyrighted content verbatim. Then the LLM is liable, and you too are liable if you redistribute it.

2) Your prompt contains copyrighted data, and the LLM transforms it, and you distribute it. Then if the transformation is not sufficient, you are liable for redistributing it.

The second example is what I'm referring to, since the commercial LLM's are now very good about not reproducing copyrighted content verbatim. And yes, OpenAI is off the hook from everything I understand legally.

Your example of commissioning an artist is different from LLM's, because the artist is legally responsible for the product and is selling the result to you as a creative human work, whereas an LLM is a software tool and the company is selling access to it. So the better analogy is if you rent a Xerox copier to copy something by Warhol. Xerox is not liable if you try to redistribute that copy. But you are. So here, Xerox=OpenAI. They are not liable for your copyrighted inputs turning into copyrighted outputs.

[dead]

You can sue the company making the LLM, which is what many have done.