The word "their" is overloaded, it could mean "thing I have the legal right to", or, "thing I have in my possession right now".
The latter condition is clearly true. It's their data.
If you pretend the other definitions of possession don't exist and claim "aktually it's not theirs they don't have rights to it" then that's on you for faking an incomplete understanding of language.
Well, but if it’s the latter definition, then the AI didn’t train on their data, since the companies took possession of that data before doing a training run.
It’s only the former definition that would allow an AI model to have been trained on someone else’s data
> It’s only the former definition that would allow an AI model to have been trained on someone else’s data
There are yet more definitions of "theirs". For example, data whose provenance can be traced back to Anna's Archive.
So the data is legally owned by the book authors, possessed by Anna's Archive, and downloaded for training usage by the AI companies. Every person in that chain could, linguistically speaking, correctly refer to the data as "theirs", or refer to the data of a different entity as "theirs".
If you steal my car, no who knows it's stolen would say it's "yours".
We're not talking abstract language concepts, this is a specific case. The data was taken without license/rights/approval. It's stolen. AA calling it "our data" is disingenuous. Legally it isn't theirs. While you could use "ours"/"theirs" loosely in English, they knew it wasn't true in a legal sense when publishing this.
Taking someone else's car illicitly is theft, because theft means taking with intent to deprive the rightful owner of it. Copying can never be theft, only moving can be theft, because only moving it could deprive the rightful owner of it. An illicit copy is merely copyright infringement or a breach of contract or various other concepts that are not theft despite people sometimes using that word as shorthand. It's YOUR illicit copy, not the rightful owner's illicit copy.
I didn't "steal" your passwords, I just "copied" them. I don't know what you're getting so upset about, you still have your list of passwords, and the fact that my changing all your accounts' passwords rendered that list worthless did nothing to move it.
It means whatever is convenient. If you are looking to monetize knowledge you would use it like "your car", half way your books are just books you've purchased a copy of, at the other end your car is now mine.
I found an abandoned bicycle 10 years ago. I have since replaced nearly all parts of it. I would give it back if you can prove it is yours but who owns the bicycle of theseus is more of an opinion.
> If you steal my car, no who knows it's stolen would say it's "yours".
The chop shop well might.
Or, if I steal your car, and then go on to use it daily for the next 10 years, at some point everyone I know will refer to it as "my" car even if they're all entirely aware it was stolen.
> they knew it wasn't true in a legal sense when publishing this
I'm not sure why you're expecting the operators of a pirate site to use legally rigorous terms to refer to themselves in a blog post. This is an error in your expectations, not their terminology.
> The data was taken without license/rights/approval. It's stolen.
That's incorrect. A license violation isn't theft. Theft deprives others of their property, that's not what's going on here. Intellectual property is a fictional "ownership" that provides value to society, but it is much newer and different than the actual ownership of property.
No one actually owns a collection of words or ideas or thoughts.
"but if you download something under a license that doesn't grant you ownership, then it isn't yours."
Possession is 9/10 of the law - if you have a copy, you have possession, and thus you have SOMETHING and LEGALLY it is considered yours (now whether you legally obtained it is a different story and THAT is where charges stem from.)
Random nit, the original saying was "possession is 9 points of the law", attributes that strengthened legal claims, rather than a percentage. Things like possession, good lawyer, money, patience, witnesses, for which if you had the object in your possession were likely to be in your favor.
Well, it is their data.
The word "their" is overloaded, it could mean "thing I have the legal right to", or, "thing I have in my possession right now".
The latter condition is clearly true. It's their data.
If you pretend the other definitions of possession don't exist and claim "aktually it's not theirs they don't have rights to it" then that's on you for faking an incomplete understanding of language.
Well, but if it’s the latter definition, then the AI didn’t train on their data, since the companies took possession of that data before doing a training run.
It’s only the former definition that would allow an AI model to have been trained on someone else’s data
> It’s only the former definition that would allow an AI model to have been trained on someone else’s data
There are yet more definitions of "theirs". For example, data whose provenance can be traced back to Anna's Archive.
So the data is legally owned by the book authors, possessed by Anna's Archive, and downloaded for training usage by the AI companies. Every person in that chain could, linguistically speaking, correctly refer to the data as "theirs", or refer to the data of a different entity as "theirs".
It's their servers sure, but if you download something under a license that doesn't grant you ownership, then it isn't yours.
You are being granted a license to use the data.
Yes, exactly, if you ignore all definitions of "yours" that involve possession then it isn't "yours".
But no one else is obligated to ignore the definitions of words that you're choosing to ignore, so the rest of us will go on saying it's their data.
If you steal my car, no who knows it's stolen would say it's "yours".
We're not talking abstract language concepts, this is a specific case. The data was taken without license/rights/approval. It's stolen. AA calling it "our data" is disingenuous. Legally it isn't theirs. While you could use "ours"/"theirs" loosely in English, they knew it wasn't true in a legal sense when publishing this.
Taking someone else's car illicitly is theft, because theft means taking with intent to deprive the rightful owner of it. Copying can never be theft, only moving can be theft, because only moving it could deprive the rightful owner of it. An illicit copy is merely copyright infringement or a breach of contract or various other concepts that are not theft despite people sometimes using that word as shorthand. It's YOUR illicit copy, not the rightful owner's illicit copy.
I didn't "steal" your passwords, I just "copied" them. I don't know what you're getting so upset about, you still have your list of passwords, and the fact that my changing all your accounts' passwords rendered that list worthless did nothing to move it.
It means whatever is convenient. If you are looking to monetize knowledge you would use it like "your car", half way your books are just books you've purchased a copy of, at the other end your car is now mine.
I found an abandoned bicycle 10 years ago. I have since replaced nearly all parts of it. I would give it back if you can prove it is yours but who owns the bicycle of theseus is more of an opinion.
I refer to it as my bicycle.
> If you steal my car, no who knows it's stolen would say it's "yours".
The chop shop well might.
Or, if I steal your car, and then go on to use it daily for the next 10 years, at some point everyone I know will refer to it as "my" car even if they're all entirely aware it was stolen.
> they knew it wasn't true in a legal sense when publishing this
I'm not sure why you're expecting the operators of a pirate site to use legally rigorous terms to refer to themselves in a blog post. This is an error in your expectations, not their terminology.
> The data was taken without license/rights/approval. It's stolen.
That's incorrect. A license violation isn't theft. Theft deprives others of their property, that's not what's going on here. Intellectual property is a fictional "ownership" that provides value to society, but it is much newer and different than the actual ownership of property.
No one actually owns a collection of words or ideas or thoughts.
Yet the main holders of this position were caught saying "our data". Don't you see the irony?
Guess what, the AI companies training their models aren't going to include themselves in the "rest of us"
The AI companies training their models are going to refer to it as their own data, once it's on their servers.
"but if you download something under a license that doesn't grant you ownership, then it isn't yours."
Possession is 9/10 of the law - if you have a copy, you have possession, and thus you have SOMETHING and LEGALLY it is considered yours (now whether you legally obtained it is a different story and THAT is where charges stem from.)
Random nit, the original saying was "possession is 9 points of the law", attributes that strengthened legal claims, rather than a percentage. Things like possession, good lawyer, money, patience, witnesses, for which if you had the object in your possession were likely to be in your favor.
Their data about not their work