As people will point out, the OSINT techniques described are nothing new - typically, in the past, you could de-anonymize based on writing style or niche topics/interests. Totally deanonymization can occur if any of these accounts link to profiles containing pictures of their faces, which can then be web-searched to link to a real identity. It's astounding how many people re-use handles on stuff like porn sites linked very easily to their IRL identity.
While people will point out this isn't new, the implication of this paper (and something I have suspected for 2 years now but never played with) is that this will become trivial, in what would take a human investigator a bit of time, even using common OSINT tooling.
You should never assume you have total anonymity on the open web.
If LLMs can identify a person across websites, I can ask LLM to read up his posts and write like him impersonating him and then this feeds back into the tools identifying him. I can probabilistically malign a person this way.
So this means deanonymization doesn't work? Rejoice?
This already is a thing people did at least as far back as I started getting into web privacy, which was ~10 years ago. I have been the target of it before.
LLM's are probably better at it, but I don't know if this is as destructive as people may guess it would be. Probably highly person dependent.
The micro-signals this paper discusses are more difficult to fake.
stylometry is only one aspect of de-anonymization. what you describe is certainly a threat that we will have to deal with, but there is a lot more to credible impersonation than just being able to mimic a writing style
How to conduct a psy-op
https://youtu.be/YTGQXVmrc6g
I think the implication is this will become trivial and trivially automated, no human investigator needed. I bet there will be plugins in one year's time to right click on a post and get a full report on who the author is.
agreed and the new frontier here will probably be obfuscation by creating false positives with these same tools, but that kind of renders the web unusable in my mind.
I had this same thought. Seems fairly easy to just put off a strong false signal. If you don’t want anyone to know that you live in Finland, make a point to constantly mention how much you enjoy living in Peru.
Wouldn't it also become trivial to pretend to be another author?
it may become more trivial to llm your comments/blog/whatever into a different "voice", but there is so much that can be used for de-anonymization that the llm-assisted technique dont address.
for example, you may change the content of your comments, but if you only ever comment on the same topic, the topic itself is a signal. when you post (both day and time), frequency of posts, topics of interest, usernames (e.g. themes or patterns), and much more.