Nvidia's even being sued for providing scripts which automate the downloading of said data from non-Nvidia sources. We certainly don't need copyrights that last nearly a century after the author's death (they literally cannot help the author), so here's hoping that some of the disputes over all this money changing hands can reign in some of the existing copyright sprawl. A stronger public domain would provide more useful training data for everyone, including open source models, and make criminals out of fewer AI researchers.