> In many cases joining datasets is both labor intensive and creates a surprising amount of new information, and there is also plenty of "free" data that is incredibly tedious to work with.

I am a perfect example of this. Due to a bit of a quirk in how my house got its address assigned to it in 1959, we have a unique postal code. If a data broker gets access to a list of product purchases by postal code from a retailer, that's in theory somewhat anonymized. However... if they also get a list of people-postal code mappings, they have now established exactly what products my wife and I have purchased (by virtue of us being the only two people with this postal code).

Do that across multiple retailers and they've painted an incredibly vivid picture of what exactly we do with our time.