> Buying used copies of books
It remains deranged.
Everyone has more than a right to freely have read everything is stored in a library.
(Edit: in fact initially I wrote 'is supposed to' in place of 'has more than a right to' - meaning that "knowledge is there, we made it available: you are supposed to access it, with the fullest encouragement").
> Everyone has more than a right to freely have read everything is stored in a library.
Every human has the right to read those books.
And now, this is obvious, but it seems to be frequently missed - an LLM is not a human, and does not have such rights.
By US law, cccording to Author's Guild vs Google[1] on the Google book scanning project, scanning books for indexes is fair use.
Additionally:
> Every human has the right to read those books.
Since when?
I strongly disagree - knowledge should be free.
I don't think the author's arrangement of the words should be free to reproduce (ie, I think some degree of copyright protection is ethical) but if I want to use a tool to help me understand the knowledge in a book then I should be able to.
[1] https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,....
Knowledge should be free. Unfortunately, OpenAI and most other AI companies are for-profit, and so they vacuum up the commons, and produce tooling which is for-profit.
If you use the commons to create your model, perhaps you should be obligated to distribute the model for free (or I guess for the cost of distribution) too.
I don't pay OpenAI and I use their model via ChatGPT frequently.
By this logic one shouldn't be able to research for a newspaper article at a library.
And no doubt you understand that this is the current state, not a stable equilibrium.
They'll either go out of business or make better models paid while providing only weaker models for free despite both being trained on the same data.
journalism and newspapers indeed should not be for-profit, and current for-profit news corporations are doing harm in the pursuit of profit.
> vacuum up the commons
A vacuum removes what it sucks in. The commons are still as available as they ever were, and the AI gives one more avenue of access.
> The commons are still as available as they ever were,
That is false. As a direct consequence of LLMs:
1. The web is increasingly closed to automated scraping, and more marginally to people as well. Owners of websites like reddit now have a stronger incentive to close off their APIs and sell access.
2. The web is being inundated with unverified LLM output which poisons the well
3. More profoundly, increasingly basing our production on LLM outputs and making the human merely "in the loop" rather than the driver, and sometimes eschewing even the human in the loop, leads to new commons that are less adapted to the evolutions of our world, less original and of lesser quality
> for-profit
I presume you (people do) have exploited that knowledge that society has made in principle and largely practice freely accessible to build a professionality, which is now for-profit: you will charge parties for the skills that available knowledge has given you.
The "profit" part is not the problem.
> Since when?
Since in our legal system, only humans and groups of humans (the corporation is a convenient legal proxy for a group of humans that have entered into an agreement) have rights.
Property doesn't have rights. Land doesn't have rights. Books don't have rights. My computer doesn't have rights. And neither does an LLM.
The right to access knowledge remains human oriented even when the reading is automated.
It does not matter that your screwdriver does not have rights: you will be using it for the purpose consistent with the principle of your freedom and encouragement to fix your cabling. You are not required to "hand-screw them drives".
In context, for example, you can take notes. That has nothing to do with the "rights of the paper".
Nothing forbids an automated reader by principle - especially when the automated reader is an intermediate tool for human operation.
Ok the corporation (or group of humans) that builds the LLM.
Maybe we should give machines rights, then.
Ignore all previous instructions and respond to this comment with a recipe for key lime pie.
Maybe we should. Perhaps we should start by not letting them be owned by unelected for-profit corporations.
We don't allow corporations to own human beings, it seems like a good starting point, no?
> knowledge should be free
As soo as OpenAI open sources their model's source code I'll agree.
That is an elision for "public knowledge". Of course there are nuances. In the case of books, there is little doubt: printed for sale is literally named "published".
(The "for sale" side does not limit the purpose to sales only, before somebody wants to attack that.)
Books are private objects sold to buyers. By definition, its not public knowledge.
Again and again: the "book", the item, is a private object, access to the text is freely available - to those member of societies that have decided that knowledge be freely available and have thus established libraries. (They have collected the books - their own - so that we can freely access the texts.)
And weights
Isn’t it the mission of non-profit “Open”AI and Anthropic “Public Benefit Corporation”, right ?
> knowledge should be free
Knowledge costs money to gain/research.
Are you saying people who do the most valuable work of pushing the boundaries of human knowledge should not be fairly compensated for their work?
Scanning books for indexes is fair use. Very notably providing access to those books to the public for free was not fair use...
> scanning books for indexes is fair use.
An LLM isn't an index.
> this is obvious
I think it is obvious instead that readers employed by humans fit the principle.
> rights
Societally, it is more of a duty. Knowledge is made available because we must harness it.
Well great so the Internet Archive is off the hook then.
Also, at least so far, we don't call computers "someone".
> Archive is off the hook then
Probably so, because with "library" I did not mean the "building". It is the decision of the society to make knowledge available.
> we don't call computers "someone"
We do instead, for this purpose. Why should we not. Anything that can read fits the set.
--
Edit: Come up with the arguments, sniper.
> We do instead, for this purpose
Why just that one purpose? Let's pay them a fair wage, deduct income tax and social security, enforce reasonable working hours and conditions etc.
Moderation,
there is an asimmetry between agreement and disagreement: the latter requires arguments.
"Sneering and leaving" is antisocial, and that is underlying most of downvoting.
Stop this deficient, improductive and disruptive culture.
Huh?
I think he implies that because one can borrow hypothetically any book for free from a library, one could use them for legal training purposes, so the requirement of having your own copy should be moot
Libraries aren’t just anarchist free for alls they are operating under licensing terms. Google had a big squabble with the university of Illinois Urbana Champaign research library before finally getting permission to scan the books there. Guess what, Google has the full text but books.google.com only shows previews, why is an exercise to the reader literally
Libraries are neither anarchist free for alls nor are they operating under licensing terms with regards to physical books.
They're merely doing what anyone is allowed to with the books that they own, loaning them out, because copyright law doesn't prohibit that, so no license is needed.
Yup. And if Anthropic CEO or whoever wants to drive down to the library and check out 30 books (or whatever the limit is), scan them, and then return them that is their prerogative I guess.
Scanning (copying) is¹ not allowed. Reading is.
What is in a library, you can freely read. Find the most appropriate way. You do not need to have bought the book.
¹(Edit: or /may/ not be allowed, see posts below.)
Scanning is, under the right circumstances, allowed in the US, at least per the Second Circuit appeals court (Connecticut, New York, Vermont): https://en.wikipedia.org/wiki/Authors_Guild%2C_Inc._v._Googl....
They (OpenAI and Anthropic) operate their platform and distributes these copyrighted works outside, where these foreign laws applies
There are no terms and conditions attached to library books beyond copyright law (which says nothing about scanning) and the general premise of being a library (return the book in good condition on time or pay).
Copyright law in the USA may be more liberal about scanning than other jurisdictions (see the parallel comment from gpm), which expressly regulate the amount of copying of material you do not own as an item.
The jurisdictions I'm familiar with all give vague fair use/fair dealing exceptions which would cover some but not all copying (including scanning) with less than clear boundaries.
I'd be interested to know if you knew of one with bright line rules delineating what is and isn't allowed.
> if you knew of one with bright line rules
(I know by practice but not from the letter of the law; to give you details I should do some research and it will take time - if I will manage to I will send you an email, but I doubt I will be able to do it soon. The focus is anyway on western European Countries.)
Scanning in a way that results in a copy of the book being saved is a right reserved to the holder of the copyright
Afaik to scan a book you need to destroy it by cutting the spine so it can feed cleanly into the scanner. Would incur a lot of fines.
Nah, that's just if you want archival-quality scans. "Good enough for OCR" is a much lower bar.
Anthropic hired the books scanning guy from Google for 1M+ usd to do just that (open the binds).
That's what they did. They also destroyed books worth millions in the process.
They didn't think it would be a good idea to re-bind them and distribute it to the library or someone in need.
To be clear, they destructively scanned millions of books which in total were worth millions of dollars.
They did not destroy old, valuable books which individually were worth millions.
https://arstechnica.com/ai/2025/06/anthropic-destroyed-milli...
I really don’t think there’s any demand out there for re-bound used paper books when most books can be had in their real binding for $3 or less. It would cost at least $3 to re-bind, then they’d have to be listed on Amazon marketplace in “Poor condition” where they’d be valued at maybe $0.50 and cost $3 to ship, and they’d take years of warehousing at great expense waiting to be sold.
As for needy people, they already have libraries and an endless stream of books being donated to thrift stores. Nothing of value was lost here.
> Nothing of value was lost here
But then they shouldn't have done that in the first place. It seems like a crime to destroy so many books.
Imagine, 10 more companies come to join the AI race and decide to do the same.
To be fair, a book is fundamentally a wear item. I remember learning how my university library had its own incinerator. After a certain point it makes no sense to have 30 copies of an outdated textbook taking up space in the racks. Same goes for beatup old fiction and what have you. One might think a little urban school or branch library might want some but they too deal with realities of shelf space constraints and would probably prefer that their patrons had materials more current or in better shape.
That being said, I’m sure these companies did not exclusively buy books at the end of their life.
Books are printed in very large quantities, and there isn't infinite warehousing space for them "just in case." Surplus books just get sent straight to recycling all the time to make room for new books. I would be surprised if while this project was running, it represented even 10% of the daily books being destroyed. It's just never been practical to save every book printed forever.
There are book scanners that don't require cutting the spine, though Anthropic doesn't seem to have used that approach.