Does a thief have the right to demand nobody re-steals what was stolen in the first place?
If we accept the existence of intellectual property in the first place, all AI is blatant and unmitigated theft.
If we do not accept it, Llama has no right to enforce such terms.
That's a pretty reductionist view. Even maximally pro-IP laws usually have significant carve-outs for derivative works or other scenarios; see for example the US's extensive "fair use" doctrine.
You're begging the question: it's not established the LLM models are derivative or fair use under those laws.
Also, the de facto state of fair use in the US is not what I would call "extensive".
Do I have to repeat "copyright violation isn't theft" on every post for eternity?
If I make copies of Project Hail Mary and sell them at half price, do you think it's obvious that should be legal? I think Andy Weir would have a decent moral as well as legal case for theft.
Many things aren't theft, but are still neither moral nor legal.
They drove Aaron Swartz to suicide using the power of the State because he downloaded public domain documents. Meanwhile these AI training companies scrape the entirety of copyrighted human content and then have the audacity to claim people are stealing from them.
What they do not do (and Aaron did) is using corporate paid subscription, agreeing to ToS and then sucking data for training or distribution. Scraping web while obeying robots.txt is much more in grey zone.
"What they do not do" -> but they've happily torrented files that other people have published through that route. At least thus far, the justice system has not seen fit to differentiate: https://www.theregister.com/2025/03/11/meta_dmca_copyright_r...
Downloading all of libgen on purpose is a lot more than 'scraping web while obeying robots.txt'.