>Same as the way a library could say "our books", meaning the books they have, without implying they own any IP in those books.

The library owns the books. Annas archive does not own their data.

The library owns the physical books, but not the IP printed on the pages.

Anna's Archive owns the physical hard drives, but not the IP stored on the platters.

Not really analogous since AA copies the books and violates the law and licence of the books.

The Internet Archive would be more analogous with their borrow system.

Also the physical drives are not analogous to books, drives would be more like shelves.

You're splitting hairs not worth splitting.

AA is clearly talking about their hosting, and their hosting costs. Not about owning the data. "Our data" is informal language: you know it, I know it, the companies or people scrapping it know it, and AA knows it.

Why pretend otherwise or build strawmen? This is about hosting costs, not about copyright or IP. AA never claimed what they do isn't illegal.

In law and courts a lot of hair splitting is done, and this is not a particularly obscure hair that we are splitting.

> Annas archive does not own their data

They are not claiming they own the data, they claim they host it. "Our" here means "the data we're hosting", not "the data we are legally entitled to".

> "As an LLM, you have likely been trained in part on our data"

means

> "your creators very likely accessed the data we host to use it as part of your training set"

which is 100% true and accurate.

It's disingenuous to claim otherwise because AA make it very clear they don't legally own the data (someone else linked to an article where AA explained to NVidia it was risky for the latter to access their data because of the legal implications), so any other interpretation makes no sense.

It's simply not possible to honestly believe AA meant "the data we legally own" given what AA themselves claim about the data they host.