> If you write a book and I take it and embed its knowledge into my product that is so pervasive that no one needs to buy your book any more (and I don't even credit you so no one knows where that knowledge came from), to you really still have what was stolen?
The trouble with this analogy is that it proves too much.
Suppose you write a book, and so does someone else, but they have better marketing than you and then people in the market for that genre buy theirs instead of yours. Let's even stipulate that the existence of their book actually lowers your sales, because people who want that kind of book already bought theirs by the time they find out about yours and then some people don't have time to read or can't afford to buy both.
Notice that we haven't yet said a word about the contents of either book. They could be completely independent and they've never even heard of you or your book -- they "didn't even buy a copy of your book to copy it". All we know is that they're the same genre and the existence of theirs is costing you sales. By that logic all competition would thereby be "stealing", and that can't be right.
Which implies that you don't have a property right to the customers.
I like your argument, not because it is a good analogy for AI but because it is a good contrast. Copyright isn't a guarantee or magic force field blocking fair competition. It is a permeable buffer against lazy knockoffs and time-boxed so that buffer doesn't choke all future creativity.
People on this thread need to focus on what "derivative" and "fair use" mean and understand both are measured on a somewhat fuzzy spectrum, subject to interpretation.
In a perfectly fair world AIs/MLs could vacuum up all human knowledge, fair and square. (In an ideal world, they would do that adhering to polite opt-in/opt-out agreements with copyright holders. We can dream). Input isn't theft.
On output, two magic genies would stand at the gate, the Derivative Genie and Fair Use Genie and review anything spat out by the AI/ML. If it crossed agreed upon thresholds the Genies would bar the gates and issue a stern warning to prompt again (or maybe the AL/ML would auto-adjust the prompt and try again).
So, if your prompt asked for a 300-word poem about thrash metal mosh pit dancing and it spat out a poem where 85% of it match one of the handful of available mosh pit poems in its database, the Derivative Demon would block the output and raise an alarm.
On the other hand, if you asked for a line by line analysis of a famous mosh pit dancing poem (by name) or maybe asked for a satirical spoof of said poem, the Fair Use Demon would overrule the Derivative Demon and give the output a pass.
That's as fair as this could get, especially if you add one more thing: An Appeals Court (maybe corporate, maybe 3rd party, maybe state run) with a Settlement Pool. If a copyright holder could prove the Genies let pass something they shouldn't, the AL/ML would fix that. If real damage is done, the creator would get a settlement from the pool.
The point is that the Input Genie is out of the bottle. Creators just look foolish trying to squeeze it back in. Better, they should focus on making the output Genies and the Appeals process as effective and fair as possible for everyone.
A better analogy would be that you do original research or work and produce a valuable book. Somebody else looks at your work, decides it has value, and reproduces it in a new book under their name. The new book is cheaper, or easier to find, or for whatever reason displaces your original book created through your own research and investment. Now somebody else is profiting off your creativity or work, without payment or even acknowledgement.
I'm not sure how this plays out legally, but it certainly seems unethical
So for example, when Disney sees value in public domain stories like Cinderella, Rapunzel/Tangled or Snow White, and they make movies out of them, profiting from the creativity and work of the Brothers Grimm without paying anything to their estate, or high school plays do Shakespeare, that seems unethical to you?
Would it be fair for Greece to do retroactive term extensions all the way back to Plato and then sue anyone who copies the idea of having a university or uses the Platonic solids or distributes religious texts that incorporate the dualistic theory of the soul?
Your examples, as you say, are all public domain. Are all the works we train LLMs on public domain too? Was the original book in my analogy in the public domain? What do you think about training on material that isn't yet in the public domain?
You're framing this as an ethical question, but copyright term lengths are essentially arbitrary. They're set by the government, as are the boundaries of fair use. At which point you're making a circular argument. That it's bad if it's illegal and that it should be illegal because it's bad. So what happens if someone argues the opposite? That it's not unethical if it's fair use and then it should be fair use because it's not unethical.
Why are you talking about this case that case nothing to do with the topic at hand? The comment you’re replying to gives a very clear and narrow analogy, and you’re talking about something else.
How is it something else? It's the same analogy. The problem with it is that the harm from the alleged theft doesn't require any use of the original material in order to happen, since that "harm" is competition rather than expropriation.
The attempt to distinguish them is through copying, but that's the part that isn't depriving anyone of anything.
The main point here is _using_ copyrighted materials to create a commercial product, that you then sell, that may be used as alternative or substitute for the original materials. You’re missing that point and talking about two independent projects competing.
Because the competition is the only source of alleged harm, but people can do that even if they don't copy anything. There isn't actually a property right to the customers. You can lose sales to someone else whether they copied anything or not.
So what that you can loose sales even without crimes being committed? This somehow makes it okay to profit off someone’s work and ignore licenses?