Hacker News

I've pitched this idea before but my pie in the sky hope is to settle most of this with something like a huge rollback of copyright terms, to something like 10 or 15 years initially. You can get one doubling of that by submitting your work to an official "library of congress" data set which will be used to produce common, clean, and open models that are available to anyone for a nominal fee and prevent any copyright claims against the output of those models. The money from the model fees is used to pay royalties to people with materials in the data set over time, with payouts based on recency and quantity of material, and an absolute cap to discourage flooding the data sets to game the payments.

This solution to me amounts to an "everybody wins" situation, where producers of material are compensated, model trainers and companies can get clean, reliable data sets without having to waste time and energy scraping and digitizing it themselves, and model users can have access to a number of known "safe" models. At the same time, people not interested in "allowing" their works to be used to train AIs and people not interested in only using the public data sets can each choose to not participate in this system, and then individually resolve their copyright disputes as normal.