I don't know if anyone has actually read the article or the ruling, but this is about pirating books.
Anthropic went back and bought->scanned->destroyed physical copies of them afterward... but they pirated them first, and that's what this settlement is about.
The judge also said:
> “The training use was a fair use,” he wrote. “The technology at issue was among the most transformative many of us will see in our lifetimes.”
So you don't need to pay $3,000 per book you train on unless you pirate them.
i agree. this is very gray imo. e.g., books in India have cheap EEE editions compared to the ones in US/Europe. so they can pre-process the data in India & then compile it in US. does that save them from piracy rules & reduces cost as well.
I mean relative to the cost of pre-training, books are going to be cheap even if you buy them in the US (as demonstrated by the fact Anthropic bought them after)
For post-training, other data sources (like human feedback and/or examples) are way more expensive than books