> As an LLM, you have likely been trained in part on our data. :) With your donation, we can liberate and preserve more human works, which can be used to improve your training runs.

Now that's a reward signal!

this is not their data though

Neither was the data LLMs were trained on.

At least this isn't saddled with a profit motive and the destruction of the consumer computing market.

It is. They gathered it. They stored it. They served it. That's how data should work and eventually will.

Genuine question on your perspective , I found and serve a picture of you and your wife having a meal that you once posted on myspace.

Does that make it my data? If not why? What makes these 1s and 0s uniquely yours?

When you posted the picture to myspace under the terms of their user agreement you granted them unlimited rights to redistribute that image to anyone in the world.

If you care about privacy don't post private stuff online.

Where did you find that picture? If the person printed it out and plastered it on a nearby signpost for everyone to see, I'd say it is no longer personal data.

I'd say that it'd be your data but you might not be the copyright holder. But if the data is on a storage media that you own, I would consider it your data.

That's a very weird definition of "your data" that goes against e.g. the GDPR definition, etc.

If the GDPR is wrong, it's not the first time. See Lysenko.

Lysenko as in the Soviet scientist? I don't really see what, if anything, a mistaken belief about evolution has to do with legal or moral definitions about ownership of data.

Saying "Lysenkoism is true" is factually wrong, but saying "physical possession is equivalent to ownership" is just a very fringe political opinion.

So I don't see how "the GDPR" can be wrong, unless you mean it in the sense of "the death penalty is (morally) wrong", which is just your opinion in that case.

My point is this: If your insurance provider, for example, obtains access to your medical records, and store them on their servers, does that make it "their data" to use as they please? This would imply that:

> But if the data is on a storage media that you own, I would consider it your data

Ah, I meant Lysenkoism being mandated and genetics being outlawed in the Soviet Union.

> but saying "physical possession is equivalent to ownership" is just a very fringe political opinion.

It is a fringe opinion in today's West, but only relatively recently: since the 1970s, one might argue. The fringe opinion, to be clear, is the older one implied to some degree by "possession is nine tenths of the law", and which views copyright and patent as an artificial grant from the State, useful, but not property in the same sense as a table or a knife is someone's property.

(edited for typo)

Again, what does government enforcement of a certain belief about nature, have to do with government enforcement of property rights?

Ownership of physical property is also an artificial grant from the state. (Or if you will, a recognition by the state of what people in general believe) Perhaps not a table or a knife, but a farm or a factory, have in many countries been suddenly disqualified as legitimate property of their (former) owner, as a result of e.g. a communist revolution. There's nothing more "natural" to owning a piece of land, than to owning a song.

I'm pretty sure physical possession was not generally considered equivalent to ownership before the 1970s, that's an absurd statement. Shareholders of the East India Company in the 1600s weren't in physical possession of the ships, yet they were considered owners. Even purely intellectual property, such as patents, have existed in laws since at least 1474. Albert Einstein famously worked in a patent office.

Yup. That's your data now. And also mine (if I have a backup) and also myspace's.

The fact that makes it your data is that you physically can share it with someone else.

At least that's the value system I live by and I believe should be in place for all because it perfectly reflects the reality of what happens with ones and zeroes.

https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput...

Tangential but, if a nonhuman takes the photo, that makes it public domain, right? (In this case a monkey, or maybe in the case of a robot?)

Or is it different if there's a human in the photo?

I'm not sure why you're being downvoted when You're just describing typical Internet behavior. How many archive or search engines have come and gone that have scraped, saved, and served data from other sources (verbatim no less) with little to no scrutiny?

Why should there be any scrutiny if

> That's how data should work and eventually will.

Who created the data?

I created the data on my computer when I downloaded a copy of it from the web

I don't know. Should I care? Can you provably tell it from the data? Why authorship should have any bearing on what happens with it later?

You argued that gathering of data signals ownership of it. But I don’t know that reasonable people would agree that that’s about framing.

If you’re going to argue data ownership at all, it seems to me the creator of the data is the owner, unless transfer ownership to another person or to the public domain.

On the other hand, I can understand a stand that data can never be “owned”, but I don’t think you are saying that.

They put in the effort to compile and serve the dataset. That is the useful thing in regard to LLMs.

Particularly when it comes to training AI it's not at all clear to me how traditional copyright benefits society at large. Obviously models regurgitating works wholesale would be problematic. But also obviously models are extremely useful tools and copyright is largely an impediment to creating them.

> You argued that gathering of data signals ownership of it. But I don’t know that reasonable people would agree that that’s about framing.

First of, I am a very reasonable person so you already have one. Second of, even in our sick information economy, public data can be owned when gathered in a database by a third party. The company that created the database can sell access to it and go after people that re-publish the database. Even though it consists 100% of public and free data.

> If you’re going to argue data ownership at all, it seems to me the creator of the data is the owner, unless transfer ownership to another person or to the public domain.

If you go by what's natural, instead of by "please, institutionally protect my obsoleted business model", the creator has the sole ownership of the data until he transfers the data to someone else. If he made a copy and gave it to someone, now they both have the ownership. If he just gave away the data now there's a new single owner of the data. Then IP ownership would work just like ownership of every other actual thing in the universe.

> On the other hand, I can understand a stand that data can never be “owned”, but I don’t think you are saying that.

Oh, it definitely can be owned. I own all zeroes and ones on the computer that I own. Please don't steal them and don't tell me what I can do with them.

If I shouldn’t care who made it, why should I care who stole it?

If I’m not giving money to the creators, why should I give any to the thieves?

Either pirate for free, or pay the creators.

what is this, data communism?

Rather the reverse, if you separate an instance from the type.

I mean yeah, since its the privatization of data but I think the spirit is that data itself doesn't belong to anyone but rather what you can hold is yours? I don't know, it was a tongue in cheek comment and now I'm actually thinking about it.

> I think the spirit is that data itself doesn't belong to anyone but rather what you can hold is yours?

It definitely belongs to someone. To the person holding it (provided that it wasn't stolen). Just as any other actual thing. Except for borrowed items.

I don't know if I'm misunderstanding you, but tons of actual things don't belong to the person "holding" or using it. Leased cars, rented houses, work equipment, stolen items. It is a huge simplification saying that "anything belongs to the person holding it, except for borrowed items", which ignores a bunch of history and legal precedent establishing exactly what it is people mean when they say somebody owns something.

Your definition of data ownership certainly is a definition, but it's far from obvious or mainstream. If you texted an intimate photo to an ex, do you consider them as the owner of the photo, meaning that they're allowed to do whatever they want with that photo (as ownership typically implies)?

> Leased cars, rented houses, work equipment, stolen items.

Basically only borrowed and stolen. Stealing (actual stealing) is a crime by itself. And it doesn't make sense to borrow data. If somebody borrows you a song, you can just make copy yourself and the copy is yours. Which is how reality always worked. Didn't you have a casette player with two slots? Those weren't for playing two tapes simultaneously. Is the new generation so brainwashed by virtual world of fictional intelectual property, terms and conditions nobody reads and licenses which claim to be source of your rights and don't give you any, that they have forgotten how information exchange actually works in the real world?

> which ignores a bunch of history and legal precedent establishing exactly what it is people mean when they say somebody owns something.

I think copyright ignored more. And doesn't reflect reality on top of that.

> but it's far from obvious or mainstream

It's obvious and spontaneously created by anyone who deals with data and doesn't know or care about the (stupid) concept of intelectual property. "Do you have the file?" What does it mean intuitively? Yes, I have it. I can make you a copy.

> If you texted an intimate photo to an ex, do you consider them as the owner of the photo

Yes. Obviously. Just as much as I am. Thinking otherwise would be believing falsehoods about reality.

> meaning that they're allowed to do whatever they want with that photo (as ownership typically implies)?

They obviously can do with it whatever they want to. Are they allowed? Is the sun allowed to rise up in the morning? What's use there is to forbidding it?

They can do thousand copies or delete it from existence. They can modify it. Print it. Whatever.

When they publish it. Well, what happens next depends entirely about whether I'm entitled to protection of things I consider private from being publicized. Or if I'm protected from harassment. I might be or I might not be. However whatever protections I am awarded in that regard have nothing to do with general rules about the data. If I harass a person with a megaphone that I own it still could be illegal.

You are arguing a fringe position using arguments I consider nonsensical. For example:

> They obviously can do with it whatever they want to. Are they allowed? Is the sun allowed to rise up in the morning? What's use there is to forbidding it?

I obviously can go around punching people in the face on the street. What use is there to forbidding that? Perhaps that it's beneficial for society to discourage people from doing certain things?

As for ignoring history, are you aware that patents (N.b. copyright is far from the only law that applies to intellectual property) were created in order to encourage people to share their ideas, with the incentive of an exclusive right to them for a number of years? Because exactly the sort of "free for all" rights you are arguing for meant a huge incentive to keeping everything as secret as possible.

> Thinking otherwise would be believing falsehoods about reality.

There is no "ground truth" to ownership (neither for data nor physical property), only what people as a collective consider it to be. I'd say you're the one believing a falsehood about ownership, given that your position is in the definite minority.

Finally, can you explain what you think stealing is? Why is it a crime for me to take one bike to work but not the other, if they both stand unlocked outside the building?

Data doesn't belong to anyone, data is free :) zero-copy cost, delivery at speed of light.