All of it belongs to Anna's Archive. They may not have the rights to have it, but the data is there no less.

They're asking for support to cover archival and bandwidth.

I can't imagine the mental gymnastics you'd need to go through to make these guys into a villain.

If you genuinely can't imagine how anyone would object to somebody taking other people's creative output and distributing it for free against their wishes then you probably need to work on your imagination a little bit.

I'm very firmly opposed to holding back societal and technological progress based on people's egos so that certainly won't be one of my projects.

There's no real harm done, I recall seeing a couple of studies showing that piracy doesn't meaningfully affect sales. If the work was worth anything, it'll get paid back by the thankful reader who can afford to pay.

Destroying the profit motive would cripple human progress more than paywalls ever could.

>If the work was worth anything, it'll get paid back by the thankful reader who can afford to pay.

Comically naive.

Tested and proven to be true, really. You're just being weird about it.

My entire life has been one continuous run down the shit slide driven by "the profit motive".

“Go into yourself. Find out the reason that commands you to write; see whether it has spread its roots into the very depths of your heart; confess to yourself whether you would have to die if you were forbidden to write.

This most of all: ask yourself in the most silent hour of your night: must I write? Dig into yourself for a deep answer. And if this answer rings out in assent, if you meet this solemn question with a strong, simple “I must,” then build your life in accordance with this necessity [...very long quote...] A work of art is good if it has arisen out of necessity. That is the only way one can judge it.” ― Rainer Maria Rilke

Everyone else, please go touch grass, we have enough books about milking farms.

Only it's been shown time and time again that piracy does not destroy the profit motive.

As a personal anecdote, when I used to pirate things, I still bought things in the same category, ie: I would pirate movies and I still bought movies. I would pirate games and I still bought games.

I don't think it affected how much of each thing I purchased by much, but I don't really know.

Most everything on earth is pretty trivial to pirate. And yet…

That's fine but not really relevant to my point. Saying you can't even imagine how people could have an issue with somebody taking other people's work and distributing it for free is pretty baffling.

Anna's Archived themselves scraped together all this data from other sources. See the notes of origin for example, often they are from zlib or libgen et ceteta.

It’s the exact same mental gymnastics that cause people to accuse model providers of large-scale plagiarism.

That is to say, not that much gymnastics. Like a cartwheel at most.

I don't really agree with those guys either.

The reason is fairly straightforward: there's no alternative if you need the dataset.

Copyright law makes it a huge amount of effort to get even an incomplete version.

And use in LLMs is transformative, so it would fall under fair use. The only reason they're in trouble with the courts at the moment from my understanding is that they pirated the content instead of idk, ripping it from Libby.

Anna's Archive aren't filing the serial numbers off the epubs they redistribute. Rightfully or wrongly distributed, the attribution is crystal clear.

I don't really care about Anna's Archive, but let's not make them out to be some kind of Robin Hood story.

They have (illegally) scraped and re-hosted mountains of proprietary data and are now deliberately prompt-injecting unwitting LLM users in order to steal money from them too.

That's not a prompt injection.

It's a gentle nudge at most and if your agent sends them money just for that without you expecting it you should donate more to thank them for finding your sev 10 bug before someone did an actual prompt injection on it.

> Yes we stole your wallet but it was your fault because you let your wallet be so easy to steal! Now you should give us even more money too!

No, you gave the wallet away.

Edit: or, rather, your synthetic 4 year old savant did. Still, entirely on you.

Illegally scraped?

What about Common Crawl, Zyte, Diffbot, and others?

You have to be pretty unwitting to hand your wallet to a text generation machine.

If you can be tricked into giving someone all your money when they politely ask for it, you weren't going to hold onto your money for very long.