Thanks for this info. I was looking for which pirated books were used for which model.
Ethically speaking, if Anthropic (a) did later purchase every book it pirated or (b) compensated every author whose book was pirated, would it absolve an illegally trained model of its "sins"?
To me, the taint still remains. Which is a shame, because it's considered the best coding model so far.
> Ethically speaking, if Anthropic (a) did later purchase every book it pirated or (b) compensated every author whose book was pirated, would it absolve an illegally trained model of its "sins"?
No, it part because it removes agency from the authors/rightsholders. Maybe they don't want to sell Anthropic their books, maybe they want royalties, etc.
Can authors even claim such rights though? I doubt think they even had such agency to begin with
If they're the rightsholders, they can do whatever they want with their IP, including changing licensing terms, adding contractual obligations forbidding certain types of use, forbidding sale, etc.
I feel like that would bounce hard off first sale doctrine. But what do I know.
You still have to adhere to license and copyright terms after first sale.
You can't sell a Bluray disk to a movie theater and give them the right to charge an audience to watch it in the theater later.
If rightsholders are worried about certain uses of their IP being found to be fair use, they might then change the terms of release contractually to stop or at least partially prevent training.