> Buying used copies of books, scanning them, and training on it is fine.
Buying used copies of books, scanning them, and printing them and selling them: not fair use
Buying used copies of books, scanning them, and making merchandise and selling it: not fair use
The idea that training models is considered fair use just because you bought the work is naive. Fair use is not a law to leave open usage as long as it doesn’t fit a given description. It’s a law that specifically allows certain usages like criticism, comment, news reporting, teaching, scholarship, or research. Training AI models for purposes other than purely academic fits into none of these.
Buying used copies of books, scanning them, training an employee with the scans: fair use.
Unless legislation changes, model training is pretty much analogous to that. Now of course if the employee in question - or the LLM - regurgitates a copyrighted piece verbatim, that is a violation and would be treated accordingly in either case.
> Buying used copies of books, scanning them, training an employee with the scans: fair use.
Does this still hold true if multiple employees are "trained" from scanned copies at the same time?
Simultaneously I guess that would violate copyright, which is an interesting point. Maybe there's a case to be made there with model training.
Regardless, the issue could be resolved by buying as many copies as you have concurrent model training instances. It isn't really an issue with training on copyrighted work, just a matter of how you do so.
Computers aren't people. And analogies aren't laws.
Yes, but the law doesn’t exist, so until it catches up, analogies are all the legal system has to work with.
The purpose and character of AI models is transformative, and the effect of the model on the copyrighted works used in the model is largely negligible. That's what makes the use of copyrighted works in creating them fair use.
Are "fantasy name generators" of the sort you find all over the place online fair use if the weighting of their generators is based on statistical information about names in fantasy novels? I would think most people would agree they're fair use, or if not in so many words, I think those people would find it pretty unfair for WotC to go around suing sites for running D&D character name generators.
Or let's talk about another form of buying copyrighted / protected content and selling the results of transforming it: emulators. The Connectix Virtual Game Station was the impetus for one of the most important lawsuits about emulation, and the ruling held that even though writing an emulator inherently involves copying copyrighted code, the result is sufficiently transformative and falls under fair use.
It fits the basicmost fair use: reading them. Current "training" can be considered as a gross form of reading.