This is such an insane take.
At this point, I think as a society we need to just say copyright as a concept and law has completely failed and scrap the whole thing.
The 0.01% of powerful copyright cartel publishers get rich while harming 99.99% of people, because we've seen further erosion of fair use rights, absurdly lengthy expansions of copyright to prop up Disney's profits and expansive interpretation of how much control copyright olders have and zero punishment for abuse of DMCA and other things.
Students should be able to learn from books, music, film. So should AI training models.
If there is any ambiguity about this, we should immediately write laws making it clear that training and education of all forms is explicitly allowed under fair use. Ideally, we also send anyone trying to prevent this to the guillotines.
I actually agree with you. I think what the LLM craze has show is that the copyright/IP laws need to adapt and not the other way around.
I think it should be legal to train a model on anything that is legal to scrape (which is almost everything).
Then, if someone uses a generative AI output that violates someones existing IP in an infringing way, go after the person that's trying to monetize that output, whether it's software, an image, or writing.
The thing is, if you limit what these things can be trained on, it creates a huge power imbalance. The wealthy and nation states are still going to scrape everything under the sun and train AIs with that data along with whatever else their surveillance has gathered. If businesses are neutered from being able to do the same, we all lose.
I have whiplash from your first and last sentences.
> Students should be able to learn from books, music, film. So should AI training models.
An AI model is a thing. It is owned and fully controlled by some agent. A student is a sentient, thinking being. Both can be trained, only one can be educated. Treating the two as comparable is misleading and in my view, wrong.
We're in strange new times, but the equivalence of human cognition and synthetic will likely become mainstream and mundane in the coming years.
Sci-fi has long had various "cyborg" type things as a plot element, but if you walk down the street in NYC today you'll pass thousands of people with pacemakers, artificial hips, insulin pumps, colostomy bags, and prosthetics. People who've had laser surgery on their eyes to see better or transplanted organs. Plus people's usage of smart watches that measure heart rate, steps, sleep quality or continuous blood glucose monitors.
We don't marvel at the cyborgs among us, we just accept it as modern medicine. Similarly, while we've gotten used to internet search and GPS turn-by-turn navigation. Gen Z and younger will probably just accept the integration of genAI into their everyday life as seamlessly and casually as we accepted our cyborgification.
You can say that an AI model can only "be trained, not educated" in the same way you can argue that a submarine doesn't swim. But does that really matter to any of the people using it?
You are preoccupied with semantics and romantic notions of blurred lines between people and software, rather than the actual reality of what a model is, and who tends to control it. The "people" training models are mostly massive business interests that exist to create profit.
Fine then, let's get rid of software copyrights too. We can copy the AI software, models, datasets all we want. They don't get copyright protection for their software while declaring that everybody else doesn't get copyright protection for their work.
Pointless distinction, you'll never see their code or weights if you just get a response from the API, so the license doesn't matter.