Settlement Terms (from the case pdf)

1. A Settlement Fund of at least $1.5 Billion: Anthropic has agreed to pay a minimum of $1.5 billion into a non-reversionary fund for the class members. With an estimated 500,000 copyrighted works in the class, this would amount to an approximate gross payment of $3,000 per work. If the final list of works exceeds 500,000, Anthropic will add $3,000 for each additional work.

2. Destruction of Datasets: Anthropic has committed to destroying the datasets it acquired from LibGen and PiLiMi, subject to any legal preservation requirements.

3. Limited Release of Claims: The settlement releases Anthropic only from past claims of infringement related to the works on the official "Works List" up to August 25, 2025. It does not cover any potential future infringements or any claims, past or future, related to infringing outputs generated by Anthropic's AI models.

Don't forget: NO LEGAL PRECEDENT! which means, anybody suing has to start all over. You only settle in this scenario/point if you think you'll lose.

Edit: I'll get ratio'd for this- but its the exact same thing google did in it's lawsuit with Epic. They delayed while the public and courts focused in apple (oohh, EVIL apple)- apple lost, and google settled at a disadvantage before they had a legal judgment that couldn't be challenged latter.

I thought the courts decided against Google in Google vs Epic? It was even appealed and upheld. Are you thinking of another case? https://en.m.wikipedia.org/wiki/Epic_Games_v._Google

Or, if you think your competition, also caught up in the same quagmire, stands to lose more by battling for longer than you did?

A valid touche! I still think google went with delaying tactics as public and other pressures forced Apple's case forward at greater velocity. (Edit: implicit "and then caved when apple lost"... because they're the same case)

> You only settle in this scenario/point if you think you'll lose.

Or because you already got the judgement you wanted. Remember Athropic's training of the AI was determined to be fair use for all the legally acquired items, which Anthropic claims is their current acquisition model anyway. If we assume that's true for the sake of argument, there's no point in fighting a battle on the remaining part unless they have something to gain by it. Since they're not doing that anymore, they don't gain, and run a very high risk of losing more. From a purely PR perspective, this is the right move.

There is already a mountain of legal precedent that you can't just download copyrighted work. That's what this lawsuit is about. Just because one of the parties is Anthropic doesn't mean this is some new AI thing.

A full case is many more years of suits and appeals with high risks, so its natural to settle which obviously means no precedent

Wont Facebook just get sued for the same thing now and maybe set precedent?

I thought meta had been sued and forgiven as it was impulsive that they do it to make money and faced no charge.

This is what is confusing me here. I did not really follow any case, but as far as I remember meta seems to have gotten away with pirating books, but anthropic needs to pay $1.5B ?

So they can also keep models trained on the datasets? That seems pretty big too, unless the half life of models is so low it doesn't matter.

It's a separate suit being wages against Meta and OpenAI etc.

There's piracy, then there's making available a model to the public which can regurgitate copyrighted works or emulate them. The latter is still unsettled

So... it would be a lot cheaper to just buy all of the books?

Yes, much.

And they actually went and did that afterwards. They just pirated them first.

What is the HN term for this? "Bootstrapping" your start up? Or is it "growth-hacking" it?

Bookstrapping

The latter (I know you're joking, but...)

Bootstrapping in the startup world refers to starting a startup using only personal resources instead of using investors. Anthropic definitely had investors.

Where can I find source that says Anthropic bought the pirated books afterwards? I haven't seen this in any official document.

Also, do we know if the newer models were trained without the pirated books?

> Where can I find source that says Anthropic bought the pirated books afterwards? I haven't seen this in any official document.

https://storage.courtlistener.com/recap/gov.uscourts.cand.43...

> Also, do we know if the newer models were trained without the pirated books?

I'm pretty sure we do but I couldn't swear to it or quickly locate a source.

Thanks for the link.

Among several places where judge mentions Anthropic buying legit copies of books it pirated, probably this sentence is most relevant: "That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages."

But document does not say Anthropic bought EVERY book it pirated. Other sections in the document also don't explicitly say that EVERY pirated book was later purchased.

I stopped using Claude when this case came to light. If the newer Claude models don't use pirated books, I can resume using it.

When you say, "I'm pretty sure we do...", do you mean that pirated books were used, or were they not used?

> But document does not say Anthropic bought EVERY book it pirated

Yeah, I wouldn't make this exact claim either. For instance it's probably safe to assume that the pirate datasets contain some books that are out of circulation and which Anthropic happened not to get a used copy of.

They did happen to get every book published by any of the lead plaintiffs though, as a point towards them probably having pretty good coverage. And it does seem to have been an attempt to purchase "all" the books for reasonable approximate definitions of "all".

> When you say, "I'm pretty sure we do...", do you mean that pirated books were used, or were they not used?

I'm pretty sure pirated books were not used, but not certain, and I really don't remember when/why I formed that opinion.

That might be practically impossible given the number of rights holders worldwide

The permission to buy them was already settled by Google Books in the 00's.

They did, but only after they pirated the books to begin with.

Few. This settlement potentially weakens all challenges to the use of copyrighted works in training LLM's. I'd be shocked if behind closed doors there wasn't some give and take on the matter between Executives/investors.

A settlement means the claimants no longer have a claim, which means if they're also part of- say, the New York Times affiliated lawsuit- they have to withdraw. A neat way of kneecapping a country wide decision that LLM training on copy written material is subject to punitive measures don't you think?

That's not even remotely true. Page 4 of the settlement describes released claims which only relate to the pirating of books. Again, the amount of misinformation and misunderstanding I see in copyright related threads here ASTOUNDS.

Did you miss the "also" how about "adjacent"? I won't pretend to understand the legal minutia, but reading the settlement doesn't mean you do either.

In my experience&training in a fintech corp- Accepting a settlement in any suit weakens your defense- but prevents a judgement and future claims for the same claims from the same claimants (a la double jeopardy). So, again- at minimum- this prevents an actual judgement. Which, likely would be positive for the NYT (and adjacent) cases.

I'm not sure how your confusion about what's going on is being projected to me. What about "also" what about "adjacent"?

>In my experience&training in a fintech corp- Accepting a settlement in any suit weakens your defense- but prevents a judgement and future claims for the same claims from the same claimants (a la double jeopardy). So, again- at minimum- this prevents an actual judgement. Which, likely would be positive for the NYT (and adjacent) cases.

Okay? I'm an IP litigator and you clearly have no idea what you're talking about. The only thing left to try in this case was the book library piracy. Alsup's fair use decision is just as relevant and is not mooted by the settlement and will be cited by anyone that thinks its favorable to them.

Thank you. I assumed it would be quicker to find the link to the case PDF here, but your summary is appreciated!

Indeed, it is not only payout, but the destruction of the datasets. Although the article does quote:

> “Anthropic says it did not even use these pirated works,” he said. “If some other generative A.I. company took data from pirated source and used it to train on and commercialized it, the potential liability is enormous. It will shake the industry — no doubt in my mind.”

Even if true, I wonder how many cases we will see in the near future.

Only 500,000 copyrighted works?

I was under the impression they had downloaded millions of books.

Individual authors had to join the class action lawsuit, sadly. They were not all automatically registered for each violation.

I’m an author, can I get in on this?

I had the same question.

It looks like you'll be able to search this site if the settlement is approved:

> https://www.anthropiccopyrightsettlement.com/

If your work is there, you qualify for a slice of the settlement. If not, you're outta luck.

I didn't see a way to search for my book there, but there definitely is an author intake form.

This site references Meta, but the training corpus probably has some overlap? Maybe?

https://www.theatlantic.com/technology/archive/2025/03/searc...

I'm an author. Can I get anthropic stock instead?