No, lots of people who read a lot used em-dashes.
Also, lots of people who use Macs, because it's very easy to type on a Mac (shift-option-hyphen).
The reason LLMs use em-dashes is because they're well-represented in the training corpus.
No, lots of people who read a lot used em-dashes.
Also, lots of people who use Macs, because it's very easy to type on a Mac (shift-option-hyphen).
The reason LLMs use em-dashes is because they're well-represented in the training corpus.
But to this frequency? (Note: I tried to find a study on the frequency of em dash use between GPT and em-dash prolific human authors, and failed.)
The article has on average, about one em dash per paragraph. And “paragraph” is generous given they’re 2-3 sentences in this article.
I read a lot, and I don’t recall any authors I’ve personally read using an em dash so frequently. There would be like 3 per page in the average book if human writers used them like GPT does.
Mostly agree, however this kind of quirk could issue entirely from post-training, where the preferences/habits of a tiny number of people (relative to the main training corpus) can have outsize influence of the style of the model's output. See also the "delve" phenomenon.
Don’t forget; a double-dash on iOS keyboard gets automagically converted to an em—dash.