I honestly think that the most extreme take that "any output of an LLM falls under all the copyright of all its training data" is not really defensible, especially when contrasted with human learning, and would be curious to hear conflicting opinions.
My view is that copyright in general is a pretty abstract and artificial concept; thus corresponding regulation needs to justifiy itself by being useful, i.e. encouraging and rewarding content creation.
/sidenote: Copyright as-is barely holds up there; I would argue that nobody (not even old established companies) is significantly encouraged or incentivised by potential revenue more than 20 years in the future (much less current copyright durations). The system also leads to bad ressource allocation, with almost all the rewards ending up at a small handful of most successful producers-- this effectively externalizes large portions of the cost of "raising" artists.
I view AI overlap under the same lense-- if current copyright rules would lead to undesirable outcomes (by making all AI training or use illegal/infeasible) then law/interpretation simply has to be changed.
> I view AI overlap under the same lense-- if current copyright rules would lead to undesirable outcomes (by making all AI training or use illegal/infeasible) then law/interpretation simply has to be changed
Not sure about undesirable, I so wish we could just ban all generative AI.
I feel profound sadness of having lost the world we had before generative AI became widespread. I really loved programming and seeing my trade devalued with vibe coding is just heart breaking. We will see mass unemployment, deep fakes, more AI induced psychosis, a devaluing of human art. I hate this new world.
It would be the morally correct thing to bann generative AI as it only benefits corporations and doesn't improve the life of the people but makes it worse.
The training of the big LLMs has been criminal. Whether we talk about GPL licensed code or the millions of artist that never released their work under a specific license and would never haven consented to it being used for training.
I still think states will allow it and legalize the crime because they believe that AI offer competitive advantages and they will fear "falling behind". Plus military use.
Anyone can very easily avoid training on GPL code. Yes, the model might be not be as strong as one that is trained that way and released under terms of the GPL, but to me that sounds like quite a good outcome if the best models are open source/open weight.
Its all about whose outcomes are optimized.
Of course, the law generally favors consideration of the outcomes for the massive corporations donating hundreds of millions of dollars to legislature campaigns.
Would it even actually help to go down that road though? IMO the expected outcome would simply be that AI training stalls for a bit while "unencumbered" training material is being collected/built up and you achieve basically nothing in the end, except creating a big ongoing logistical/administrative hassle to keep lawyers/bureaucrats fed.
I think the redistribution effect (towards training material providers) from such an scenario would be marginal at best, especially long-term, and event that might be over-optimistic.
I also dislike that stance because it seems obviously inconsistent to me-- if humans are allowed to train on copyrighted material without their output being generally affected, why not machines?
Human learning is materially different from LLM training. They're similar in that both involve providing input to a system that can, afterwards, produce output sharing certain statistical regularities with the input, including rote recital in some cases – but the similarities end there.
>Human learning is materially different from LLM training [...] but the similarities end there.
Specifically what "material differences" are there? The only arguments I heard are are around human exceptionalism (eg. "brains are different, because... they just are ok?"), or giving humans a pass because they're not evil corporations.
Why? I'm pretty sure I can learn the lyrics of a song, and probabilistically output them in response to a prompt.
Is the existence of my brain copyright infringement?
The main difference I see (apart from that I bullshit way less than LLMs), is that I can't learn nearly as much as an LLM and I can't talk to 100k people at once 24/7.
I think the real answer here is that AI is a totally new kind of copying, and it's useful enough that laws are going to have to change to accommodate that. What country is going to shoot itself in the foot so much by essentially banning AI, just so it can feel smug about keeping its 20th century copyright laws?
Maybe that will change when you can just type "generate a feature length Pixar blockbuster hit", but I don't see that happening for quite a long time.