I’m not going crazy, right, nearly nobody aside from professional writers used em dashes prior to 2022. And the whole bolded topic intro, colon, short 1-2 sentence explanation seems way more like a product of GPT formatting than how organic humans would structure it?
So much writing on the internet seems derivative nowadays (because it is, thanks to AI). I’d rather not read it though, it’s boring and feels like a samey waste of time. I do question how much of this is a feedback loop from people reading LLM text all the time, and subconsciously copying the structure and formatting in their own writing, but that’s probably way too optimistic.
I made a conscious effort to switch from hyphens to em dashes in the 2010's and now find myself undoing that effort because of these things, so I try not to instantly assume "AI". But look long enough and you do notice a "sameness": excellent grammar, fondness for bulleted lists, telltale phrases like "That's not ___, it's ___."
And a certain vacuousness. TFA is over 16000 words and I'm not really sure there's a single core point.
But to this frequency? (Note: I tried to find a study on the frequency of em dash use between GPT and em-dash prolific human authors, and failed.)
The article has on average, about one em dash per paragraph. And “paragraph” is generous given they’re 2-3 sentences in this article.
I read a lot, and I don’t recall any authors I’ve personally read using an em dash so frequently. There would be like 3 per page in the average book if human writers used them like GPT does.
Mostly agree, however this kind of quirk could issue entirely from post-training, where the preferences/habits of a tiny number of people (relative to the main training corpus) can have outsize influence of the style of the model's output. See also the "delve" phenomenon.
The entire blog is full of characteristic LLM styles: The faux structure on top of rambling style, the unnecessary and forced bullet point comparisons with equal numbers of bullets, the retreading of the same concept in different words section after section.
The rest of the blog has even more obvious AI output, such as the “recursive protocol” posts and writing about reality and consciousness. This is the classic output you get (especially use of ‘recursive’) when you try to get ChatGPT to write something that feels profound.
I agree. Good core idea, but it feels quite stretched.
Most of the examples used to justify creation vs consumption can also be explained by low scale vs high scale (cost sensitive at high scale) or portability.
I’m not going crazy, right, nearly nobody aside from professional writers used em dashes prior to 2022. And the whole bolded topic intro, colon, short 1-2 sentence explanation seems way more like a product of GPT formatting than how organic humans would structure it?
So much writing on the internet seems derivative nowadays (because it is, thanks to AI). I’d rather not read it though, it’s boring and feels like a samey waste of time. I do question how much of this is a feedback loop from people reading LLM text all the time, and subconsciously copying the structure and formatting in their own writing, but that’s probably way too optimistic.
I made a conscious effort to switch from hyphens to em dashes in the 2010's and now find myself undoing that effort because of these things, so I try not to instantly assume "AI". But look long enough and you do notice a "sameness": excellent grammar, fondness for bulleted lists, telltale phrases like "That's not ___, it's ___."
And a certain vacuousness. TFA is over 16000 words and I'm not really sure there's a single core point.
No, lots of people who read a lot used em-dashes.
Also, lots of people who use Macs, because it's very easy to type on a Mac (shift-option-hyphen).
The reason LLMs use em-dashes is because they're well-represented in the training corpus.
But to this frequency? (Note: I tried to find a study on the frequency of em dash use between GPT and em-dash prolific human authors, and failed.)
The article has on average, about one em dash per paragraph. And “paragraph” is generous given they’re 2-3 sentences in this article.
I read a lot, and I don’t recall any authors I’ve personally read using an em dash so frequently. There would be like 3 per page in the average book if human writers used them like GPT does.
Mostly agree, however this kind of quirk could issue entirely from post-training, where the preferences/habits of a tiny number of people (relative to the main training corpus) can have outsize influence of the style of the model's output. See also the "delve" phenomenon.
Don’t forget; a double-dash on iOS keyboard gets automagically converted to an em—dash.
The entire blog is full of characteristic LLM styles: The faux structure on top of rambling style, the unnecessary and forced bullet point comparisons with equal numbers of bullets, the retreading of the same concept in different words section after section.
The rest of the blog has even more obvious AI output, such as the “recursive protocol” posts and writing about reality and consciousness. This is the classic output you get (especially use of ‘recursive’) when you try to get ChatGPT to write something that feels profound.
I agree. Good core idea, but it feels quite stretched.
Most of the examples used to justify creation vs consumption can also be explained by low scale vs high scale (cost sensitive at high scale) or portability.