The amount and extent of data that is available out there by brokers for purchase by literally any company is *mind-boggling*. However bad you think it is, multiply that by 10.
The amount and extent of data that is available out there by brokers for purchase by literally any company is *mind-boggling*. However bad you think it is, multiply that by 10.
A colleague created a banner ad that was an image that had the text “told you I could do this mate!” and targeted an individual to prove a point.
The general public have no idea how much ad providers and data brokers know about them.
Seems just like retargeting in that case. Ask “victim” to visit page A. On that page A place a retargeting pixel, then now everywhere on the Internet you can display a message for that user as long as you are willing to pay a high price for that impression (high price is way way way less than 0.1 USD)
Reminds me of the time when Signal(the private messaging app) once tried to get ad data from Facebook and show it to users with a high degree of specificity eg “You got this ad because you’re a middle aged woman who enjoys kpop and loves reading about Christopher Nolan”
Relevant article: http://archive.today/fzUL4
I need you to tell me how I do this right now. This will put so much cred into my spiels with people in meatspace. So many bricks will be shat!
Around 2014 I worked with recruiters and they had a tool that aggregated data on everyone through LinkedIn, yelp, twitter, GitHub, eventbrite, etc. it was breathtaking the amount of information you could get on anyone, over 10+ years ago.
I’m guessing with the help of Palantir, the government has even more data and can probably link Reddit posts etc based on styleometry and can even perform psychological analysis on your personality and tendencies, etc.
>styleometry
I really need to start using PocketPal (local LLM on Android) to restate my messages.
---
Oh, the places I'd like to send my texts so fine, With PocketPal, a tool that's truly divine, Local LLM on Android, a wondrous device to see, To help me restate my messages with glee! Wheee!
The government has been buying and funding R&D with data brokers since before Google existed.
> it was breathtaking the amount of information you could get on anyone, over 10+ years ago.
After being burnt by things taken from my social media out of context, used to publicly shame me, I locked down my social media
Am I "sweetly naive" to think that had an effect? I do think it did
Before I stopped using Facebook I noticed, over the last decade, that almost every account I encountered was locked down similarly
My point is I suspect it is getting harder, not easier, for data thieves. The golden age of data theft has passed. Maybe.
I work in this space - I'd say 1000x.
Could you elaborate with specifics? If it's this bad, why haven't we heard anything from a whistleblower or seen a good demo?
Because none of it is really unknown? People know about it and don't care. Hell, even people on this forum that should know better and care that don't, or think when they hear about stuff like this it's FB pixel or google analytics stuff. The simple fact is with a few basic pieces of information on somebody, there's almost nothing that is sacred or not for sale. People mistakenly believe they're protected by adblockers and stuff, or by avoiding social media, but the simple fact is that it is unavoidable while simply existing and the 1000x comment is from my POV the scale of it is astounding and growing every year and people really don't have a good understanding of the subtle and not subtle ways it can affect you, or when told, don't care/dismiss it. So I don't really feel anymore like explaining it. If more people understood, I'd also stand to profit quite a bit from it, so that's where my frustrated tone is coming from.
I'm pretty sure it was over when we switched to debit/credit cards. Everywhere you go, how much you buy, all that stuff has been sold for quite a while now.
People voluntarily used loyalty cards well before then.
I remember when loyalty cards first came to England. There were consumer rights shows on TV devoting entire episodes to the evils of their spying.
It’s amazing how much worse things have gotten, yet how people seem to care less now than they used to.
I wonder if it’s just consumers being so overwhelmed by their lack of control that they’ve become apathetic to the problem as a whole.
No, it was before this, with phone lines and wiretapping because forcibly allowed by law. As soon as we said "okay, you're allowed to record stuff if it's for a good purpose", it was over.
cash is tracked as well, it's been over for a long time. each bill has a serial # and it gets scanned going in and out of the bank. Yes, it's still marginally easier to launder cash but if you just take it out of the ATM and spend it at a store it'll get tracked accurately
I don't think this is as accurate as you are making out. Wawa (a connivence store in the Philly area) isn't tracking each $10 that goes in and out of the register. It could float all over the city before hitting a bank, and even then banks typically track serial numbers for large demonizations and we when there's a suspicion of illegal activity. Happy to learn more about this if I have it wrong.
> demonizations
denominations, perhaps?
How would one find out what data brokers knew from their cash purchases?
Do banks sell this information? This bill was pulled from this ATM in Georgia by one Claudius McMoneyhands, and then deposited by one CashMoneyBusiness LLC in South Carolina three weeks later
Seems like there could still be intermediaries and a lack of what you actually bought with it at least?
Oh boy, don't give them any more ideas. This would work.
I suggested that this might be happening and had someone pretty quickly dismissing that the Chinese ATM maker here (oddly specifically, and happens to be my bank ATM here in Wyoming which I never stated), would put in the extra hardware for that. The hardware is mostly there already for imaging (how does the machine verify it's valid cash?) and it analyzes digits with a small neuronet for cheques (decades old tech). It's all there, just write some back-end stuff, and process bill images at a colocation if the ATMs don't have the horsepower to get the serial number locally.
Grocery store lets you draw $200 cashback out of their register.
My favorite example is the story about a data broker who, the day after 9/11 happened went from the name "Muhammad" to a list of ~1K people which included 1 out of 4 of the 9/11 terrorists.
https://www.nytimes.com/2023/09/22/magazine/hank-asher-data....
Thanks for your perspective.
I'm aware that using adblockers and avoiding social media doesn't entirely prevent tracking, shadow profiles, and such, but surely it makes things more difficult for these companies, no? Or would you say that there's practically no difference between making an effort to preserve one's privacy and just giving up entirely?
> the subtle and not subtle ways it can affect you
In Manufacturing Consent they measured column inches in the NYT-- IIRC it was something like measuring the total that support the relevant U.S. administration's official position on given policy vs. inches that went against the gov't position. In any case, they were measuring column inches.
What were you measuring to come to your conclusion?
I don't really understand the point of this comment.
I really don't think they "know". They have an idea. But they really don't understand any sort of extent or implication.
If the FTC could do anything here to make this situation better, it would be to give every person access to any data about them that gets sold.
I could give you some great horror stories, but honestly I don't see the benefit in either potentially harming former coworkers of mine that still work at those places or ending myself in some sort of career/legal trouble for something people generally don't care about (other than a few points on HN).
If you were caught demoing something both horrific and internal you would risk serious damage to your career, and ultimately will have zero impact on the industry as there's just too much data out there and too much money wrapped up in it.
Plus, most people working with the data don't bother to look at it. The places I've internally demo'd massive privacy risks were shocked because they didn't realize what their own data was capable of. Most people are just writing jobs that run and shuffle data around from one place to another never really asking "what is this data?" Even among data scientists I'm routinely surprised (so maybe I shouldn't be surprised) how frequently data scientist never do any real error analysis by looking at what the model got wrong and trying to understand why.
We hear about it all the time but no one cares.
I guess you were just distracted by all of the other house-on-fire crap going on.
https://therecord.media/ftc-complaint-against-kochava-unseal...
Among the additional information Kochava collects and sells are non-anonymized individual home addresses, phone numbers, email addresses, gender, age, ethnicity, yearly income, “economic stability,” marital status, education level, political affiliation and “interests and behaviors,” compiling and selling dossiers on individuals marketed as offering a “360-degree perspective,” the FTC said.
...
According to the FTC, Kochava’s data can identify women who visit reproductive clinics by name and address along with, for example, when they visit particular buildings, their names, email and home addresses, number of children, race and app usage.
...
Kochava marketing materials tell customers it offers “rich geo data spanning billions of devices globally” and that its location data feed “delivers raw latitude/longitude data with volumes around 94B+ geo-transactions per month, 125 million monthly active users, and 35 million daily active users, on average observing more than 90 daily transactions per device.”
...
The complaint also alleges that the company has lax procedures for determining who it is selling data to, saying purchasers are allowed to use a generic personal email address, label an alleged company as “self” and explain they plan to use the data for “business.”
And then there's this: https://therecord.media/data-brokers-are-selling-military-se...
I was on a team of about 25 involved in pitching a particularly large deal to a public sector client (think US state/local governments). The audience was about 50 people from different departments and agencies throughout the state and our pitch team consisted of about 6-8 very big shots + me the computer nerd. During our prep and rehearsals a "look book" was distributed which consisted of write ups on each person expected to be in the audience. It was very detailed with a career and education history of each person, a personality analysis, where their interests/passions lie both at work and personally, and what topics and key points set them off. The deck was very professional and not something thrown together, i was impressed but a little taken aback too.
Cuz it's not really unknown nor is it illegal.
I know someone who bought the address of everyone with a specific first name.
> nor is it illegal
Where I live it is.
I simply don't believe you that all data brokers are completely and entirely illegal where you live.
Anyway to combat it or stop your info from being overly harvested?
I asked this same thing in another comment here, but since you mention working in this space, I ask you directly. Where do the brokers obtain their data from? If it's easy for them to obtain, would those who buy it from brokers not be able to simply get it from its respective sources? I'm genuinely curious about how this dynamic works.
I would say that in general the HN crowd doesn't understand the industry at all, and they need to change the direction of their understanding, rather than the magnitude. Your basic hackernews believes that e.g. Google is out there selling all your personal information. But compared to these other industries the tech industry is almost airtight. It has long been possible for someone to pick up the phone and order, in any format they want, transaction data as narrowly targeted as they wish. Credit card line items for 35-year-old dentists living on the 400 block of Elm street in local town? By end of day.
This is correct; what people fundamentally misunderstand is that data brokers directly sell personal information about people, but Google and Facebook only allow for targeted advertising while keeping personal information within the confines of their company.
This isn't misunderstood, just not relevant. Google sells to a funnel that plays a numbers game, not for individuals to be targeted.
The meta-conspiracy-theory would be that the dossier industry whips up conspiracy theories about online advertisers in order to maintain their own low profile.
It has been truly frustrating when people will blame the "tech industry" for what is essentially reckless behavior from other industries. For a while, it was often the finance sector that did most of the crazy stuff. With crypto being an obnoxious overlap of the two.
It has been truly frustrating when arms dealers are punished for what is essentially reckless behavior from warlords, dictators, and drug cartels.
Data brokers are the OG tech industry. They've been around since the late 60s selling consumer data. Just because it's unsexy data storage and query work doesn't make it less tech.
I mean, somewhat fair. But when people decry "big tech," they aren't talking about these companies.
I'm also surprised that this is so hidden from everyone. Where are the engineers leaking secrets? Much of the online discourse is pure speculation based on what can be observed from the very end of the chain. (ie, what your computer is giving up) The speculation is not necessarily _incorrect_ but is too vague to be useful to anyone. Where does my data _actually_ go? Does anyone know? Can anyone describe the life of my data as it goes through the whole ecosystem? Does anyone know what mitigations are, and are not effective?
Because what's the headline you're going to get out of it?
If the headline is "Mark Zuckerberg is amassing your data and you know it's for evil", it's an easy sell. If it's "there's an ecosystem of little-known companies that sell transaction, location and lifestyle data to marketers, journalists, PIs, and police departments alike", it's not exactly the kind of a message that spurs people to action. And yeah, the newspaper that would be breaking the news is a customer too.
Despite being near universally hated externally, data brokering is a boring industry and is seen as very mundane and routine. They don't attract the type of engineers that have a strong moral stance and will go rogue and blow the whistle. They attract the middle age suburbanite just trying to get through the day and make a living.
I think the HN crowd is especially vocal about the tech industry in particular because that's the industry a lot of us have first-hand knowledge of - we know from personal observation that it is anything but airtight
Is that actually possible? Can we do a live test here?
Let's say we want this dataset: Credit card line items for 35-year-old dentists living on the 400 block of Elm street in local town
How much do I have to pay you to get it?
How much you got?
Never ask a sales person how much yo have to pay when the prices are not already clearly stated. Tell them how much you are willing to spend to see if they will do it for that amount. Sales people will always shoot high hoping to not leave money on the table. The price might change depending on how much you squeal and how high they shot. Your initial "willing to spend" should also be lower than you're actually willing to spend for the same but converse reason
Ok, so nobody here knows directly of any case where such data has been purchased, or vaguely similar, and we have no pricing information whatsoever available, but we are somehow completely knowledgeable about it being possible and how to do it? That sounds unlikely.
The supposedly in-the-know responses here are full of bravado but not much other than "trust me, bro"
https://news.ycombinator.com/item?id=44565878
Yea, you know everything, don't you.
Wow the Transunion business site, that really proves it huh.
Experian is known to sell the data they have. Why is this even in question? If I provide you Experian's website, you would give the same BS response?
Let me google this for you...
https://duckduckgo.com/?q=how+to+buy+data+from+a+data+broker...
The conversation was for buying transaction data from specific people, something that many seem to insist is easy and cheap and doable. Meanwhile if you actually read the responses to that search you smugly cited you'll find that no one seems to know how to actually do anything remotely like this. Yes this data is definitely harvested and it seems like you should be able to buy it in bulk from someone somewhere, but again no one seems to know where or how much or what the purchase minimum would be etc.
You asked for an example, one was given. if you’re saying you dont know how to send an email to a business page with the products purporting services described here - no comment in this section can help you
[dead]
Yeah people fail to provide examples but continue to be doomers about how easy it is.
Been busy, but since you seem to be unable to find any body by searching on your own for the past 6 hours, here's something I found with a quick little search.
https://datarade.ai/data-categories/food-grocery-transaction...
Have we really lost the ability to use search functionality??
Of course people do. 5 seconds spent doing the most sparse-ass research will help you find plenty of stuff. If people don't respond, I imagine, for fear of 1) outing the specific area they work in, or 2) realizing these kinds of comments aren't generally acting in good faith so it is generally a complete waste of time.
I'll waste my own time and give a trivial example just off the top of my head. Go peruse some of the products offered on this page, put on your thinking cap or even look into them further and imagine what kind of data those services provide, where it likely comes from, and where it is sold to, and you'll be well on your way - and those are just the ones that are advertised openly.
https://www.transunion.com/business
Pretty much every one of the big players people typically associate with other areas such as personal credit have some feet in this space somewhere. Then theres the hundreds of lesser-known fly-by-night guys that have their own DB's they build off of mostly what is the same data, but correlated in different ways and sold to different audiences.
There are many, many services offering data-for-sale on practically anything to practically anyone. I heard of one recently claiming it can reliably determine someone's porn preferences. The fact you personally have never come across it, or are saying you aren't, is only a data point that is interesting to you, and no one else that actually knows what they are talking about in this space. Hope this post helps you somehow.
I didn’t ask for a link to a company that can do it. I want pricing. I am saying that nobody here is willing to share anything even approaching specific pricing, which makes me very much doubt that any of them have the direct transaction experience they are claiming. I don’t doubt that underwater welding exists, but I do doubt that anyone in this thread has done it, or has any direct experience with it.
Literally all anyone is asking for is one single concrete example of a site where you can roll up and buy personal information.
>There are many, many services offering data-for-sale on practically anything to practically anyone. I heard of one recently claiming it can reliably determine someone's porn preferences.
Okay but then why not name at least a couple such services. Also, if the tech industry isn't selling data to them, where do they obtain it? Again, I see lots of ambiguity here, and the example link from transunion is hardly revealing of anything.
Credit card companies are known to sell data. https://www.cbsnews.com/news/mastercard-credit-card-customer...
Mobile service providers are known to have sold data. https://www.fcc.gov/document/fcc-fines-largest-wireless-carr...
Auto makers are known to sell data. https://www.caranddriver.com/news/a61711288/automakers-sold-...
You act like it doesn't happen, yet time and time again we learn about companies selling whatever data they can collect.
I can't believe we are still questioning this fact
What else do you need to know?
I think you misunderstand. I'm not doubting that it happens widely and pervasively. It's evident that this is the case. I just requested examples based on some of the very specific claims made here despite many ambiguities in how they were phrased.
Anyhow, thanks for taking the time to include some links.
For the most part, readers here are against it. Just because someone doesn’t know how to do it does not mean it is not doable. If it were not doable, these companies would not exist. I’ve already spent more time than I care on the topic. So if you want to think that people are collecting the data and not selling it to interested parties, the, boy, I don’t know. You can only lead hostess to water, but you can’t make it drink.
Hostess? You mean horses no?
>So if you want to think that people are collecting the data and not selling it to interested parties, the, boy, I don’t know.
As I very clearly said above, I don't doubt it at all, I was just asking for any clarification on who to whom.
and you were given them. so why keep taking this persistent obstinate line of questioning and persistent downvoting? it’s transparent and tired. industry experts chime in on this stuff all the time, it isnt anything done in backrooms or anything and is in the open. the only barrier to you not knowing is your own ignorance.
But what type of range are we talking? Tens, hundreds, thousands?
It could also mean that if you have to ask... or the first rule of data brokering...
Seems like the first thing to do would be to get an account with one of these data brokers. I'd imagine most of these places are "contact us for pricing" so they can play used car salesman games
Or, you could ask John Oliver to do it for you and then tell all of us on one of his episodes exactly how in depth it could get. They have the money to do this, and it seems like something right in his team's wheel house
If you need John Oliver to do it maybe it's not such a big problem? If no one here is able to provide a single concrete example, maybe it's not real?
John Oliver likes to spend HBO's money to do things others can't do while entertaining the rest of us. I'm not spending my money on something to prove what is known as possible for you. At this point, even with receipts, you're coming across as someone that would argue that grass is not green, or water isn't wet, and fire isn't hot.
Just because someone doesn't answer your belligerent questions does not mean it's not possible. It probably means that the people that are doing this with first hand knowledge have too much to do than trying to convert doubting Thomas over here.
All of this started because in response to an extremely concrete question, what's the cost of transaction data for a tightly constrained population, you replied with a smug non-answer about the greed of salespeople. These questions only got "belligerent" because every single answer has been nonsense insisting that it's super easy and cheap but also I couldn't possibly name a single site where this data is sold or provide even an order of magnitude of cost. Or maybe now it requires HBO levels of funding, who knows.
I offered sage advice on how to negotiate when you don’t know a firm price on anything whether that be data or a car or a home remodeling. If you want to say that advice was a smug answer then that’s on you. Every answer after has just gone further and further off the rails
Nah there's no way you actually watch John Oliver because that was really funny. Anyways, you mentioned earlier that we wouldn't believe you even if you posted receipts but that's actually exactly what we want to see. Like, just the name of a business, the thing that was sold, and the price.
i think it could be feasible to get an ad in front of "35-year-old dentists living on the 400 block of Elm street in local town" who has bought product X but i've never seen a transaction by transaction purchase history being for sale.
> Your basic hackernews believes that e.g. Google is out there selling all your personal information.
I think most people here understand that Google sells ads against that data, but they aren't selling the data.
Anyway to opt out of this type of data collection per company? I know for some things you can contact each individual broker and opt out (via some identifier like your email address) of your data being at least publicly available
> Your basic hackernews believes that e.g. Google is out there selling all your personal information
To add to this, any mention of "telemetry" is taken to mean your PII being taken by bad actors to abuse, instead of what it is in 99% of cases, which is usage statistics. (X% of our users use feature A, it merits investment). It can be both, but there's usually no place for differentiation, just pitchforks.
> It can be both, but there's usually no place for differentiation
Fool me once, shame on you. Fool me 153,927,861 times, shame on me.
The place for differentiation, the place for "oh this is probably fine", the benefit of the doubt is, of course, lost.
Because someone (you? people shaped like you?) who misuse telemetry destroyed trust.
> It can be both
should instead be "it usually is both and you the user have no way to know anyway."
The industry betrayed consumers' trust to the point where no project can be trusted to be mindful of data anymore. Even Proton Mail ended up ratting to the French, and that was just IP and session info, so who can we even trust to get "good telemetry"?
Logs aren't telemetry and calling a response to a court order "ratting out" is exactly the kind of behavior that makes people increasingly skeptical of privacy advocates.
> Even Proton Mail ended up ratting to the French,
Answering to court orders isn't "ratting". You either answer court orders or go to prison.
Or they architect their system better so that they never collect the IP addresses to begin with. I think Privacy Pass and other things Mullvad is doing help in this area, but I am not aware of Proton working with them to implement anything like this. But Proton should do this, because it’s relevant to customers of Proton.
https://discuss.privacyguides.net/t/privacy-pass-the-new-pro...
Apparently not Privacy Pass related, will keep looking as I seem to remember that Mullvad was doing that implementation, but I may remember incorrectly.
https://discuss.privacyguides.net/t/mullvad-has-partnered-wi...
I don't think it is common to refer to server logs as "telemetry".
> Credit card line items for 35-year-old dentists living on the 400 block of Elm street
I do not believe that. I would like evidence before I am convinced
If my bank is releasing that data I am horrified. I live in anew Zealand and our privacy laws are clear: it would be illegal
Backwards in 2025, ask for proof it is not happening. It’s the POS terminal that actually sells the data, btw. NZ may be “behind.”
Okay, and who are these people you contact for this data, and how do they themselves obtain it so precisely? You say the big tech industry is pretty air-tight about sharing data, so how does mysterious X company have on hand the credit ratings of all those youngish dentists on Elm street, among other kinds of information? How o these dynamics work, since you seem to know it internally?
A mobile provider enters into marketing sharing agreements with credit card companies. It extracts housing information from local property and tax records. It enters into marketing sharing agreements with retailers, payment processors like ADP. Same with license plate reading companies, loan companies, banks, professional organizations, etc.
It fills its data lakes with the vectorization and down tilt data that it collects every day. It uses federated batched Hadoop tasks to join the above data lakes into one large data lake. Mid-PB in size.
Then it looks for mobile phones that travel to the 400 block at night and stay there, that are buying dentist stuff from Walmart, travel to a dentist office every workday, have an income over $120k, and are a member of the local dentist society. Maybe look for someone with dentist student loans, graduated with a dental degree.
None of those data points can identify an individual. Taken together they can ID just about anybody.
But maybe there is a chance that you ID their wife/husband. So maybe include/exclude people that regularly visit OBGYN offices.
Back in the day we could link cell numbers to credit card purchases in locations to the point of being to identify the name of the person and what they purchased and where it was purchased. For all people in a metro area that were using credit cards and physically visiting stores.
what are some good cheap sources to get this? i have an art project idea that i've wanted make that would require invasive data profiles, but it's very big project and i have no idea where to start
My question here is also how the brokers obtain the data themselves? Wouldn't it be simple for those who buy it from the brokers at a markup to just get it from its original sources themselves? Also, if the data is in any case available, the real at-fault culprits aren't so much the brokers as those who store and so easily sell it in the first instance.
> Wouldn't it be simple for those who buy it from the brokers at a markup to just get it from its original sources themselves?
In many cases joining datasets is both labor intensive and creates a surprising amount of new information, and there is also plenty of "free" data that is incredibly tedious to work with.
I used to work with real estate data for the government and if you search for any common things you might want to know you often land on a data brokers page even though property assessor data is freely available in most counties. The problem is each county has their own system of storing data and their own process for searching it. It's a lot of work to learn how just this one dataset works, combining this for all counties in the US is a massive project.
Whenever I buy a new home I always look up all my neighbors, figure out when they bought the house, how much they paid etc. Some people get freaked out by this, but this information is public in most counties.
By joining this data with another public data set, you can actually figure out which lender your neighbors used and what their reported income at time of sale, their age and ethnic background.
Of course there are plenty of other ways data brokers come across data, but even cleaning up and joining public data can require a fair bit of time and expertise.
> In many cases joining datasets is both labor intensive and creates a surprising amount of new information, and there is also plenty of "free" data that is incredibly tedious to work with.
I am a perfect example of this. Due to a bit of a quirk in how my house got its address assigned to it in 1959, we have a unique postal code. If a data broker gets access to a list of product purchases by postal code from a retailer, that's in theory somewhat anonymized. However... if they also get a list of people-postal code mappings, they have now established exactly what products my wife and I have purchased (by virtue of us being the only two people with this postal code).
Do that across multiple retailers and they've painted an incredibly vivid picture of what exactly we do with our time.
Thanks for the detailed reply! So essentially, what many of them do is scour public data sets of all kinds, cross-reference them and repackage the more complete product as their own, which people then buy simply because it's easier to get it that way, all wrapped up neatly than doing the legwork? This is the basic gist of it? As for the complex and highly specific data about individuals, they do the same thing or do they buy from still other sources? I also wonder if they buy any hacked information off the dark web.
Sellers of the data wanna deal with one or a few buyers that buy bulk. They dont wanna deal with thousands of customers.
Further, they are literally in the business of selling your data for a profit.
It should not be surprising that they are selling your data for a profit...