I've had this idea kicking around in my head now for a few months that this is an opportunity to update copyright / IP law generally, and use the size and scope of government to do something about both the energy costs of AI and compensation for people whose works are used. At a very rough draft and high level it goes something like this:

Update copyright to an initial 10 year limit, granted at publication without any need to register. This 10 year period also works just like copyright today, the characters, places, everything is projected. After 10 years, your entire work falls into the public domain.

Alternatively, you can register your copyright with the government within the first 3 years. This requires submitting your entire work in a machine readable specified format for integration into official training sets and models. These data sets and models will be licensed by the government for some fee to interested parties. As a creator with material submitted to this data set, you will receive some portion of those licensing feed, proportional to the quantity and amount of time your material has been in the set, with some caps set to prevent abuse. I imagine this would work something like the broadcast licensing for radios works. You will receive these licensing fees for up to 20 years from the first date of copyright.

During the first 10 years, copyright is still enforced on your work for all the same things that would normally be covered. For the 10 years after that, in additional consideration for adding your work to the data sets, you will be granted an additional weaker copyright term. The details would vary by the work, but for a novel for example, this might still protect the specific characters and creatures you created, but no longer offer protection on the "universe" you created. If we imagine Star Wars being created under this scheme, while Darth Vader, Luke Skywalker and Leia Organa might still be protected from 1987-1997, The Empire, Tatooine, and Star Destroyers might not be.

What I envision here is that these government data sets would be known good, clean, properly categorized and in the case of models, the training costs have already been paid once. Rather than everyone doing a mad dash to scrape all the world's content, or buy up their own collection of books to be scanned and processed, all of that work could already have been done and it's just a license fee away. Additionally because we're building up an archive of media, we could also license custom data sets. Maybe someone wants to make a model trained on only cartoons, or only mystery novels or what have you. The data is already there, a nominal fee can get you that data, or maybe even have something trained up, and all the people who have contributed to that data are getting something for their work, but we're also not hamstringing our data sets to being decades or more out of date because Disney talked the government into century long copyrights decades ago.