Something that seems inherently different between GP's comment and encryption is that encryption is an algorithm / tool, not a dataset. Not creating literal tools because they might have bad use cases is clearly a bad idea (e.g., fire, knives, hammers, etc.).
I'd say that one thing inherently different about datasets is that they are continually used badly, including by well-meaning actors. Data is frequently misinterpreted, with good intent, to draw bad conclusions.
You might hit your thumb with a hammer. That hurts! People would be a lot more careful if misinterpreting data had such clear, immediate effects on them.
Also, there are many different groups with different passionate opinions in any community as large as this one.
What is the distinction you are making between a "dataset" and a "tool"?
To use this specific example of the license plate dataset, this is a tool used to find and bring justice to criminals. How is it any different from any other tool at the disposal of law enforcement? Isn't this system just a scaled up version of a cop with a camera?
This might be too pedantic, but a dataset is not a tool in and of itself. It's something that can be processed by a tool. And it's not simple for anyone to reproduce without significant access, either to the original observational opportunities or to the dataset itself. Information about individuals is often in datasets and those people too seldom have a say in the security practices used to safeguard it.
Tools (or pick another word that illustrates this distinction) like encryption, hammers, etc. do not contain our information. They are fairly straightforward to reproduce. And therefore nearly impossible to contain. Bad actors will have encryption and hammers, whether we want them to or not. The only question is whether good actors will also have them, or if they will be restricted by laws. This, for example, can make it easier for datasets to fall into the wrong hands, because they are less likely to be encrypted.
It seems very strange to define these terms based off the difficulty in reproducing them.
Let's look at the sibling comment's example of a nuclear bomb. That's "not simple for anyone to reproduce without significant access" and as citizens we don't "have a say in the security practices used to safeguard it." And international laws have done a relatively good job keeping them out of the hands of bad actors. Does that make them a dataset?
Contrast that with data that is easy to reproduce, like say the name of the 45 different Presidents of the US. That is obviously a dataset. Yet there is no private information involved, it is all public data. Many people can even produce that list entirely from memory. But having that list on a piece of paper in front of me could still be a helpful tool if I was taking a US history test.
Isn't an atomic bomb just a scaled up version of a firecracker?
Nobody denies that collection of datasets can have upsides. But the downsides are often not seen/evaluated accurately. And negative effects don't necessarily scale with the same power as positive effects.
>Isn't an atomic bomb just a scaled up version of a firecracker?
Yes and no. I think radiation is a big differentiator, but absent that, I don't think it is better morally or ethically to level a city with conventual bombs than it would be to do it with a nuclear bomb.
>Nobody denies that collection of datasets can have upsides. But the downsides are often not seen/evaluated accurately. And negative effects don't necessarily scale with the same power as positive effects.
I'm not disagreeing with this. I'm asking why this same logic is not applied elsewhere.
The point with the firecracker/bomb is this: Not just because it's the same type of tool means that it has the same cost/benefit analysis. The dangers of, say, firecrackers in the had of the general public, scale dramatically faster than the benefit, going from kid-safe firework to bunker busters. The same goes for "a cop with a camera" to "tag readers at every corner".
I think with encryption, the underestimate is on the other side. Everyone understand that bad guys using encryption is bad. But people do not see the upsides of encryption for the good guys, pretty much for the same reason as they do not see the downsides of data collection: I have nothing to hide. [or the common related variant: Advertisement doesn't affect me]
> I think with encryption, the underestimate is on the other side. Everyone understand that bad guys using encryption is bad. But people do not see the upsides of encryption for the good guys
And why are you confident that this doesn’t exist for the license plate dataset? You’re confidentially making two opposing arguments with no justification beyond it getting you to your desired conclusion on that specific issue.
That what doesn't exist for the license plate dataset? I am sure there are good reasons for having that dataset. For most data collection, there are good reasons.
My argument is that just because we decided that "police with camera" is a worthy trade-off, you cannot use this as an argument for "license plate scanning is a worthy trade-off". It could be that it is, but it doesn't follow from "it's a scaled up version of police with camera".
I think you are going too deep down individual tangents here. My “cop with a camera” comment was challenging the idea that datasets aren’t tools.
If the issue is purely about amplifying the danger of bad actors and therefore forcing us to reevaluate the tradeoffs, encryption and AI do that too.