I wanted to point out that em dashes are autocompleted by the iOS keyboard. So the false positives and true negatives might have some overlaps without more details. I think a better indicator would be to only detect em dashes with preceding and following whitespace characters, and general unicode usage of that user.

Additionally, lots of Chinese and Russian keyboard tools use the em dash as well, when they're switching to the alternative (en-US) layout overlay.

There's also the Chinese idiom symbol in UTF8 which gets used as a dot by those users a lot, so that could be a nice indicator for legit human users.

edit: lol @ downvotes. Must have hit a vulnerable spot, huh?

> I wanted to point out that em dashes are autocompleted by the iOS keyboard.

That’s why the analysis was performed over time. All of those em dash sources you mentioned were present before LLM written content became popular.

I think there is a baseline number of human users that for one reason or another uses em-dashes, but this doesn't explain why they 10x more prevalent in green accounts.

> I think there is a baseline number of human users that for one reason or another uses em-dashes, but this doesn't explain why they 10x more prevalent in green accounts.

I'm not trying to negate the fact. I'm just pointing out that a correlation without another indicator is not evidence enough that someone is a bot user, especially in the golden age of rebranded DDoS botnets as residential proxy services that everyone seems to start using since ~Q4 2024.