> If I code a var blah = 5*5; I know the answer is always 35

I greatly enjoy the irony here.

It's okay, we've replaced the Turing test with the em dash test

The em dash thing seems weird to me. The writing style guide for the college I attended as a freshman was big on them, and I never shook the habit. Not being able to easily conjure one was one of the biggest annoyances when I was forced to switch from macOS to windows.

> Not being able to easily conjure one was one of the biggest annoyances when I was forced to switch from macOS to windows.

I always install AutoHotkey if I have to use Windows for long periods of time. Interestingly, the bindings are so intuitive that I had actually come up with the _exact same_ bindings as macOS without knowing they existed. Imagine my surprise when I switched to a mac and found out they were there natively!

I find the em dash thing weird as well. I bunch of people who didn’t know what an em dash was a couple of years ago decided that it’s a signature LLM move.

It depends where you find it. If it's a comment, it's highly unlikely it would include careful punctuation such as semicolons, whereas for em-dash you need to do something extra as it's not available on the keyboard as a single keystroke by default, so everybody is using a hyphen instead of em-dash or en-dash.

However, a magazine article, or even a blog where the author cares might include all: printer quotes instead of straight ones, en/em dashes, ellipsis as as single character and many more. If suddenly half of the web is filled with shallow content dressed up in certain styling, people are right to feel something is not right.

> whereas for em-dash you need to do something extra

OPT+SHIFT+- on macOS. It's no more difficult to type than a lot of other punctuation/common symbols.

OK, that macOS. On Windows you had to remember the arcane Numpad combination (provided you had a numeric keyboard). That makes it uneven - the hyphen is just universal.

And on iOS it’s a long-press on the hyphen. It’s not inconvenient at all when you’re used to using them.

Very few humans go to the effort of using a true em dash in Internet comments (almost everyone just uses a hyphen), so it's a pretty good LLM indicator when paired with a certain writing style.

Until LLMs came around, I rarely saw other people use interrupting/parenthetical clauses at all, em dash or not. Kind of the same with semi-colons even. Or bold or subtle italics.

I’ve always enjoyed the style that em dashes and semi-colons add to a piece of writing and it was what made me start using them. It was always notable to me when I noticed them in someone’s else’s writing, which was always rare.

So are typos such five times five is thirty—five.

A good reason to also start using em dashes wherever inappropriate.

But definitely not none— I use them in comments all the time, and have for decades. I find asinine observations conveyed with repetitive, circular wording to be a better indicator.

It just contrasts expectations of the unwashed masses with more professional writing.

If most people are used to reading social media and texts from their friends and maybe subtitles for movies, an em dash is practically never going to appear, and so when everyone and their dog start using them, well, it’s obvious something is up.

Whereas the more literate individual used to consuming writing for pleasure will have seen them regularly, and may even have employed them while writing.

[deleted]

I use them all the time. I get endless crap now for it lol

[deleted]

One of my first jobs was as the programmer/IT/graphics guy at a newspaper. Everybody there was required to use em-dashes properly and regularly, and followed other esoteric rules from the Associated Press Stylebook that also regularly appear in LLM output.

This highlights just how much unlicensed copyrighted material is in LLM training sets (whether you consider that fair use or not).

> This highlights just how much unlicensed copyrighted material is in LLM training sets (whether you consider that fair use or not).

Is there any license copyrighted material in their original training sets? AFAIK, they just scrapped it all regardless of the license

Inflation