I never understand why anyone wants authors to not be able to enforce copyright and licensing laws for AI training. Unless you are Anthropic or OAI it seems like a wild stance to have. It’s good when people are rewarded for works that other people value. If trainers don’t value the work, they shouldn’t train on it. If they do, they should pay for it.
My own view is, I thought we were all agreed that the idea that Microsoft can restrict Wine from even using ideas from Windows, such that people who have read the leaked Windows source cannot contribute to Wine, was a horrible abuse of the legal system that we only went along with under duress? Now when it's our data being used, or more cynically when there's money to be made, suddenly everyone is a copyright maximalist.
No. Reading something, learning from it, then writing something similar, is legal; and more importantly, it is moral. There is no violation here. Copyright holders already have plenty of power; they must not be given the power to restrict the output of your brain forever more for merely having read and learnt. Reading and learning is sacred. Just as importantly, it's the entire damn basis of our profession!
If you do not want people to read and learn from your content, do not put it on the web.
If you want people to read and learn from each other, you should incentivize people to make content worth reading and learning from. Making LLM training a viable loophole for copyright law means there won’t be incentives to produce such work.
I don't think that's the case.
People getting better at writing is only going to increase the quality of the output.
Increasing both competition and tooling (by providing every writer with the world's greatest encylcopedia/thesaurus/line-editor/brainstormer/planner/etc) is only going to make writers better.
Will there be lots of people who misuse the system? Are there lots of people who use thesaurus words without knowing what they're talking about? Can't you tell the difference?
I see in LLMs a lowering of the ground floor making it easier for people to get in. This will increase the total availability of content.
I also see in LLMs a raising of the top bar making it harder to be the best. If more people are writing and more people are trying to be the best, the best is going to get better.
Consider chess. Have we suddenly stopped playing chess now that a phone can beat 95+% of people? No. The market is stronger than ever and still growing. The greatest player in the world use the chess algorithms to refine their play and the play keeps expanding in new and interesting ways.
In both writing and chess, yes, there is an explosion of low and middling play. But since when have we not always had people producing content and playing chess that when compared to the masters of the field is generally viewed as substandard?
But here's the kicker. Some people's favorite genre is badly editted fanfic. Some people genuinely derive actual pleasure from things that you or I might call garbage. And what's wrong with that? Who am I to say that you can't love clutzy firecop loves suburban housewife paperbacks? Or Zelda/Harry Potter crossfics or whatever.
Re-reading your comment, I think we’re both generally anti-corporate-fuckery. I view the current batch of copyright pearl clutching to be an argument about if VCs are allowed to steal books to make their chatbots worth talking to, and the Wine/MSoft debate about if it should be legal to engage in anticompetitive behavior by restrictive use of copyright. In both of these cases the root of the issue isn’t really the copyright as an abstract- it’s the bludgeoning of the person with less money by use of overwhelming legal costs to have a day in court.
>I never understand why anyone wants authors to not be able to enforce copyright and licensing laws for AI training.
Fair use is part of "copyright and licensing laws".
Would using an actors face and voice as training data be fair use?
What it the model then creates a virtual actor that is very close to the real actor?
>What it the model then creates a virtual actor that is very close to the real actor?
"Likeness" is a separate concept from copyrights
https://en.wikipedia.org/wiki/Personality_rights
I wish I lived in the alternative timeline where open source folks didn't look a gift horse in the mouth and actually used these tools to copy left the shit out of software to the point where proprietary closed source software has no advantage.
But instead we've got people posting "honey pots" that an LLM will immediately detect and route around.
I bet we'd cure all cancers in a month if everyone whining about slop actually went and did something about it.