Many years ago (early 2000s) I worked for a firm that would help identify people who were doing "pump and dump" stock scams on Yahoo Finance message boards.
Step 1 was to scrape all of their posts into a database.
Step 2 was to have a human analyst review all of the posts for clues about who that person was
It was amazing that you could easily figure out:
- if they were at work or home from when they posted (9am to 5pm vs 6pm to 1am)
- what city they were in (based on sports teams, mentioning local landmarks etc0
- roughly what career they had
- their age based on cultural references
and mostly b/c they would drop a crumb of information here and there over months. They probably forgot about all of these individual events but when reading all of the posts in a few hours, the details became pretty evident. You get enough of these details and you can start to venn diagram people down to a few 100 likely candidates and then use LexisNexus style tools to narrow it down even further.
Given the above, it doesn't surprise me that LLMs can do the same but at high speed and across multiple sites etc.
Did you have a contract with SEC? Just wondering what kind of business would have an interest in that.