This might be too pedantic, but a dataset is not a tool in and of itself. It's something that can be processed by a tool. And it's not simple for anyone to reproduce without significant access, either to the original observational opportunities or to the dataset itself. Information about individuals is often in datasets and those people too seldom have a say in the security practices used to safeguard it.

Tools (or pick another word that illustrates this distinction) like encryption, hammers, etc. do not contain our information. They are fairly straightforward to reproduce. And therefore nearly impossible to contain. Bad actors will have encryption and hammers, whether we want them to or not. The only question is whether good actors will also have them, or if they will be restricted by laws. This, for example, can make it easier for datasets to fall into the wrong hands, because they are less likely to be encrypted.

It seems very strange to define these terms based off the difficulty in reproducing them.

Let's look at the sibling comment's example of a nuclear bomb. That's "not simple for anyone to reproduce without significant access" and as citizens we don't "have a say in the security practices used to safeguard it." And international laws have done a relatively good job keeping them out of the hands of bad actors. Does that make them a dataset?

Contrast that with data that is easy to reproduce, like say the name of the 45 different Presidents of the US. That is obviously a dataset. Yet there is no private information involved, it is all public data. Many people can even produce that list entirely from memory. But having that list on a piece of paper in front of me could still be a helpful tool if I was taking a US history test.