After the book publishers burned Google Book's Library of Alexandria, they are now making it impossible to train a LLM unless you engage in the medieval process of manually buying paper-copies of work just to scan & destroy them...
After the book publishers burned Google Book's Library of Alexandria, they are now making it impossible to train a LLM unless you engage in the medieval process of manually buying paper-copies of work just to scan & destroy them...
If they wanted a copyright free world maybe they should publish all their models as copyright free as well. But they are not doing it are they?
don't come up with logics defending the proletarian /s
They are nondestructive methods of scanning. I bought an edge scanner to scan collectible public domain books for Project Gutenberg.
for recent books, they could buy digital version of the books and use them for training, though.
I was wondering about this, but digital versions are typical DRM-encumbered and actually a license (not a true purchase) whose terms probably don't allow this. The court's decision was that training is fair use, but in practice, it seems many avenues are blocked.
It reminds of the theoretically public beaches that are blocked off by privately owned land.
DRM is irrelevant. That's only if you want to efficiently extract the text.
If you point a camera at an ebook reader with a little motor to tap the screen, "next" that's still easier than scanning physical books.
The reason why companies aren't using ebooks is because all the publishers and ebook companies make you click through a license stating that "this book for personal use" (paraphrased).