My question here is also how the brokers obtain the data themselves? Wouldn't it be simple for those who buy it from the brokers at a markup to just get it from its original sources themselves? Also, if the data is in any case available, the real at-fault culprits aren't so much the brokers as those who store and so easily sell it in the first instance.
> Wouldn't it be simple for those who buy it from the brokers at a markup to just get it from its original sources themselves?
In many cases joining datasets is both labor intensive and creates a surprising amount of new information, and there is also plenty of "free" data that is incredibly tedious to work with.
I used to work with real estate data for the government and if you search for any common things you might want to know you often land on a data brokers page even though property assessor data is freely available in most counties. The problem is each county has their own system of storing data and their own process for searching it. It's a lot of work to learn how just this one dataset works, combining this for all counties in the US is a massive project.
Whenever I buy a new home I always look up all my neighbors, figure out when they bought the house, how much they paid etc. Some people get freaked out by this, but this information is public in most counties.
By joining this data with another public data set, you can actually figure out which lender your neighbors used and what their reported income at time of sale, their age and ethnic background.
Of course there are plenty of other ways data brokers come across data, but even cleaning up and joining public data can require a fair bit of time and expertise.
> In many cases joining datasets is both labor intensive and creates a surprising amount of new information, and there is also plenty of "free" data that is incredibly tedious to work with.
I am a perfect example of this. Due to a bit of a quirk in how my house got its address assigned to it in 1959, we have a unique postal code. If a data broker gets access to a list of product purchases by postal code from a retailer, that's in theory somewhat anonymized. However... if they also get a list of people-postal code mappings, they have now established exactly what products my wife and I have purchased (by virtue of us being the only two people with this postal code).
Do that across multiple retailers and they've painted an incredibly vivid picture of what exactly we do with our time.
Thanks for the detailed reply! So essentially, what many of them do is scour public data sets of all kinds, cross-reference them and repackage the more complete product as their own, which people then buy simply because it's easier to get it that way, all wrapped up neatly than doing the legwork? This is the basic gist of it? As for the complex and highly specific data about individuals, they do the same thing or do they buy from still other sources? I also wonder if they buy any hacked information off the dark web.
Sellers of the data wanna deal with one or a few buyers that buy bulk. They dont wanna deal with thousands of customers.