>It's clear to me that AI training is transformative fair use under existing law.

I wouldn't even go that far. Its an entirely new product. Its like the guy who sold you the keyboard demanding royalties for the software you built.

That the person who wrote the book couldn't predict a new use case for the book in training LLMs, is irrelevant. The book isn't in the LLM. Its not being sold with the LLM. Its one of billions of tools used to create the LLM.

People try and sell this as the AI companies extracting value from the poor little IP holders like Disney. Its maddening. That content is your cultural heritage. It already belongs to you, just some idiot has been granted a lifetime of exclusive exploitation. An LLM is trained on data you already own. Disney et al wants to exploit the new technology to extract even more money out of stuff created often decades ago.

At absolute worst its reverse engineering, which was supposed to be fair use protected in the US but apparently that's been somewhat eroded.

> The book isn't in the LLM.

An LLM is essentially a lossy compression of the training data. The book absolutely is in there, it’s just mangled to the point of unrecognizability.

The wood tends to have an impression of the hammer that hits it. The book isn't in there, the weights are just shaped by what tools were used to form it.

When large quantities of source material are replicable by prompting its a bug not a feature.

That's just semantics. The wood would be there without the hammer, the LLM wouldn't be here without the copyrighted works it's based on.

No, thats just semantics.

>The LLM Wouldnt be here without the copyrighted works

Google wouldn't be here if it hadn't scraped every copyrighted website and used them to form a searchable graph of the internet but we only hear complaints about them when they reproduce entire news articles.

If my book isn’t in your LLM, then prove it and don’t use my book to train your LLM.

>don’t use my book to train your LLM.

What makes you think you are entitled to tell people what they can and cant do with data they purchased (or otherwise acquired) from you. Extremely honest question. I just cant put myself in your shoes.

Like if I had written anything useful I would be overwhelmingly flattered that my content be considered so worthy for inclusion.

Your profile suggests that you are a philosopher. Did you get into philosophy hoping to exploit the publishing industry to the extent that you can squeeze every cent out of your thoughts, and deny their potential uses downstream?

Its actually crazy how bad things are, I am usually keen on capitalism and exclusivity, but the whole thing with LLMs, I see people pushing hard to tighten the grip of intellectual property. I see people making 50 cents a month on Kindle Unlimited suddenly shocked that someones LLM generated output might be ever so slightly influenced by weights ever so slightly influenced by their work, seemingly thinking they might get some big payday out of it.

Give me a tiny little wedge of understanding of your thought process. Your book is right now, doing a greater social good on your behalf than me running around and removing all the trash from my neighborhood, and the benefits of that social good are going to accrue long after you and I are gone. Your work is now going to live on, in a very tiny way, in these systems forever. I am honestly envious.

If anything, I would be trying to get bad writing removed from LLM training data. Things that I dont want to influence others. But as a potentially honest promoter of your work, you want it removed?

Whats the number? If not 1:1 exactly what you charge for the book, what do you think the proper compensation for slightly influencing training weights you should receive?

> What makes you think you are entitled to tell people what they can and cant do with data they purchased

Hundreds of years of copyright law. I bought a copy of Windows, but I’m not allowed to modify that data with a cracker and sell a bootleg DVD of it.

I should edit to clarify that I’m not a big fan of Lars Ulrich or Disney, but I don’t think we’re going to get a win here for the recreational IP pirates. What’s more likely is that we’ll end up with some Frankenstein law that favors both Mikey Mouse and OpenAI, and you and I will neither get free movies nor the ability to earn a living off of our creative labor.

I mean, the comparable situation would be, being allowed to sell something you created on Windows.

But in abstract you should absolutely be able to modify and sell windows.

To continue your analogy, I had to pay for Windows before I was allowed to create something with it, or acquire a license for under terms they set forth. If AI companies stopped at the public domain, then my argument wouldn't really hold up, but they didn't do that. They acquired everyone's copyrighted works without regard for the license and now they're, in the most charitable interpretation, using them to create derivative works.

And before you give me an analogy about how someone could listen to Pink Floyd and then produce works inspired by their influence yada yada: Someone is a human being with human rights, and if we're going to start pretending that training an LLM is in any way analogous to human consumption and creativity, and not an industrial process that encodes input data into a digital artifact, then let's start by saying LLMs have human rights and cannot be owned by a company that charges for access to them.

>To continue your analogy, I had to pay for Windows before I was allowed to create something with it, or acquire a license for under terms they set forth.

Yep and so far it looks like the issue with the meta case is they didnt pay for the book. Not that they used it in training data.

>in the most charitable interpretation, using them to create derivative works.

Yeah in the same way I use a hammer to create a derivative table.

>Someone is a human being with human rights, and if we're going to start pretending that training an LLM is in any way analogous to human consumption and creativity.

I dont care about that. Its simply a tool being built using existing tools. Like using a jigsaw to make a step ladder.

> Yep and so far it looks like the issue with the meta case is they didnt pay for the book. Not that they used it in training data.

Let's not sane-wash what they did here, they didn't just 'forgot to pay for the books', they deliberately and illegally downloaded and used material that wasn't theirs to use.

If you or I did that, we would be jailed or sued into destitution. In a fair world we either should change copyright laws (allowing for anyone to freely pirate all media), or Zuckerberg needs to go to jail.

>Let's not sane-wash what they did here, they didn't just 'forgot to pay for the books', they deliberately and illegally downloaded and used material that wasn't theirs to use.

Yes. Forgot is your word.

But lets face it, there wouldn't be a case to answer for if they had paid retail for each book, torn them up and scanned them and trained on that data.

>Zuckerberg needs to go to jail.

I am comfortable with that but would prefer updating copyright.

A million dollars please.

It’s called a copyright notice. Same as a license. If you’re running a commercial business you can’t legally just take that piece of work and reuse it. Pick any book off your shelf and pretty well every one of them will have words to the effect of:

All rights reserved. No part of this publication may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording, or other electronic or mechanical methods, without the prior written permission of the publisher, except in the case of brief quotations embodied in critical reviews and certain other noncommercial uses permitted by copyright law. For permission requests, write to the publisher, addressed "Attention: Permissions Coordinator," at the address below.

Same as every piece of commercial software has a license which has to be abided by. Same as use of Meta’s service has terms and conditions which HAVE to be agreed to.

So yeah they’re free to break that license but they’re also free to be sued by IP holders for breaking it at scale.

Well its not a solved issue in terms of law. But even still, I would have expected you to understand that I wasnt speaking legally.