As opposed to an irregular person?

LLMs are not persons, not even legal ones (which itself is a massive hack causing massive issues such as using corporate finances for political gain).

A human has moral value a text model does not. A human has limitations in both time and memory available, a model of text does not. I don't see why comparisons to humans have any relevance. Just because a human can do something does not mean machines run by corporations should be able to do it en-masse.

The rules of copyright allow humans to do certain things because:

- Learning enriches the human.

- Once a human consumes information, he can't willingly forget it.

- It is impossible to prove how much a human-created intellectual work is based on others.

With LLMs:

- Training (let's not anthropomorphize: lossily-compressing input data by detecting and extracting patterns) enriches only the corporation which owns it.

- It's perfectly possible to create a model based only on content with specific licenses or only public domain.

- It's possible to trace every single output byte to quantifiable influences from every single input byte. It's just not an interesting line of inquiry for the corporations benefiting from the legal gray area.

Dude come on, I clearly wasn't saying LLMs are people. My point was it's a tool and it's the responsibility of the person wielding it to check outputs.

If it's too hard to check outputs, don't use the tool.

Your arguments about copyright being different for LLMs: at the moment that's still being defined legally. So for now it's an ethical concern rather than a legal one.

For what it's worth I agree that LLMs being trained on copyright material is an abuse of current human oriented copyright laws. There's no way this will just continue to happen. Megacorps aren't going to lie down if there's a piece of the pie on the table, and then there's precedent for everyone else (class action perhaps)

Alright, I did make that assumption because I've seen and heard people talk about LLM as people. It worries me that otherwise functional and reasonable people, some of them my friends, have been so easily been convinced by a machine which demonstrated its flaws to me daily.

As for checking outputs - I don't believe that's sufficient. Maybe the letter of the law is flawed but according to the spirit the model itself is derivative work.

A model takes several orders of magnitude more work as training data than it takes to code the training algorithm itself, to any reasonable and sane person, that makes it a derivative work of the training data by nearly 100% - we can only argue how many nines it should be.

> precedent

Yeah but the US system makes me very uneasy about it. The right way to do this is to sit down, talk about the options and their downstream implications, talking about fairness and justice and then deciding what the law should be. If we did that, copyright law would look very different in the first place and this whole thing would have an obvious solution.