Buying used copies of books, scanning them, training an employee with the scans: fair use.
Unless legislation changes, model training is pretty much analogous to that. Now of course if the employee in question - or the LLM - regurgitates a copyrighted piece verbatim, that is a violation and would be treated accordingly in either case.
> Buying used copies of books, scanning them, training an employee with the scans: fair use.
Does this still hold true if multiple employees are "trained" from scanned copies at the same time?
Simultaneously I guess that would violate copyright, which is an interesting point. Maybe there's a case to be made there with model training.
Regardless, the issue could be resolved by buying as many copies as you have concurrent model training instances. It isn't really an issue with training on copyrighted work, just a matter of how you do so.
Computers aren't people. And analogies aren't laws.
Yes, but the law doesn’t exist, so until it catches up, analogies are all the legal system has to work with.