i just want to know where emdash came from, as it is quite rare to see it on the public internet, so it must have been synthetically added to the dataset.
i just want to know where emdash came from, as it is quite rare to see it on the public internet, so it must have been synthetically added to the dataset.
Emdash is very common in academic journals and professional writing. I remember my English professor in the early 2000s encouraging us to use it, it has a unique role in interrupting a sentence. Thoughtfully used, it conveys a little more editorial effort, since there is no dedicated key on the keyboard. It was disappointing to see it become associated with AI output.
Other than things other comments already mention, let's not forget that Microsoft Word auto-corrects "--" to em-dash, and so does (apparently - haven't checked myself) Outlook, Apple Pages, Notes and Mail. There's probably bunch of other such software (I vaguely recall Wordpress doing annoying auto-typography on me, some 15 years ago or so).
Because on the public internet people don’t have arts degrees which are where emdash users learn to wield it correctly.
I learned about em-dashes by reading Knuth about 40 years ago.
The very simplified answer is that the models are first trained on everything and then are later trained more heavily on golden samples with perfect grammar, spelling, etc..
although emdashes are not common on the internet, there are prevalent in books.
Logo_Daedalus tended to use it a lot
https://xcancel.com/Logo_Daedalus
`---` in TeX?
It has been rare. It's common now, even in meaningful human texts. (I know because I detest the correct usage without spaces, t looks wrong.) One of the ways AI is shaping our minds.