Isn't this about generating output after all?
I'm not sure if I get your distinction about "consumption".
> Do the current copyright laws not already protect the authors and give them tools for takedowns and remuneration?
That was also my point in the prior HN comment thread on the MS news submission that I mentioned.
Good luck starting "fair use" copyright lawsuits against a myriad of auto-generated derivatives. This was already hard for naïve creators with humans and (mostly) human-run corporations on the other end.
If the goal is to prevent companies from training on copyright material, then yes, it is about consuming the material, not generating it. The generation part comes from anecdotal incidents where some copyright material has been generated.
- This is not the normal
- This can be changed over time, there are also moderation techniques that can be used.
- We already have remedies for those publishing or selling copyrighted material already
So I personally see a difference between training time and inference time. Using the potential for copyrighted material to be generated, to prevent its usage a training time is... luddite territory... imho
I'm not a luddite.
And I don't think that my argument was as narrow as you make it out to be.
It's not required to exactly reproduce training material for an AI to output something that wouldn't stand a "fair use" trial.
"Summarize XY, but prefer different words" is already enough for a blog post. And the possibility to do that is not limited to inference-time input.
Copyright law is about humans, not machines. The problem is scale. You deflected this argument instead of addressing it.
And regarding training: you seem to anthropomorphize LLMs in a weird way.
LLMs can only generate content that is entirely derived from their training data.
That the derivation is close to a blackbox for humans does not elevate machines to humans.
The burden of proof about training materials is IMO with LLM companies, not with human creators.
Because companies know full-well that anything that's not an obvious exact reproduction will require humans starting lawsuits in order to claim a copyright violation.
You say:
> - We already have remedies for those publishing or selling copyrighted material already
And I say, with regard to AI, you seem to be intentionally misinterpreting my comment.