They drove Aaron Swartz to suicide using the power of the State because he downloaded public domain documents. Meanwhile these AI training companies scrape the entirety of copyrighted human content and then have the audacity to claim people are stealing from them.
What they do not do (and Aaron did) is using corporate paid subscription, agreeing to ToS and then sucking data for training or distribution. Scraping web while obeying robots.txt is much more in grey zone.
"What they do not do" -> but they've happily torrented files that other people have published through that route. At least thus far, the justice system has not seen fit to differentiate: https://www.theregister.com/2025/03/11/meta_dmca_copyright_r...
Downloading all of libgen on purpose is a lot more than 'scraping web while obeying robots.txt'.