My suspicion is that the problem here is pretty simple: people publishing articles that contain these kinds of LLM-ass LLMisms don't mind and don't notice them.

I spotted this recently on Reddit. There are tons of very obviously bot-generated or LLM-written posts, but there are also always clearly real people in the comments who just don't realize that they're responding to a bot.

I think it's because LLMs are very good at tuning into the what the user wants the text to look like.

But if you're outside that and looking in the text usually screams AI. I see this all the time with job applications even those that think they "rewrote it all".

You are tempted to think the LLMs suggestion is acceptable far more than you would have produced it yourself.

It reminds me of the Red Dwarf episode Camille. It can't be all things to all people at the same time.

People are way worse at detecting LLM written short form content (like comments, blogs, articles etc) then they believe themselves to be...

With CVs/job applications? I guarantee you, if you'd actually do a real blind trial, you'd be wrong so often that you'd be embarrassed.

It does become detectable over time, as you get to know their own writing style etc, but it's bonkas people still think they're able to make these detections on first contact. The only reason you can hold that opinion is because you're never notified of the countless false positives and false negatives you've had.

There is a reason why the LLMs keep doing the same linguistic phrases like it's not x, it's y and numbered lists with Emojis etc... and that's because people have been doing that forever.

It's is RLHF that dominates the style of LLM produced text not the training corpus.

And RLHF tends towards rewarding text that first blush looks good. And for every one person (like me) who is tired of hearing "You're making a really sharp observation here..." There are 10 who will hammer that thumbs up button.

The end result is that the text produced by LLMs is far from representative of the original corpus, and it's not an "average" in the derisory sense people say.

But it's distinctly LLM and I can assure you I never saw emojis in job applications until people started using Chatgpt to right their personal statement.

> There is a reason why the LLMs keep doing the same linguistic phrases like it's not x, it's y and numbered lists with Emojis etc... and that's because people have been doing that forever.

They've been doing some of these patterns for a while in certain places.

We spent the first couple decades of the 2000s to train ever "business leader" to speak LinkedIn/PowerPoint-ese. But a lot of people laughed at it when it popped up outside of LinkedIn.

But the people training the models thought certain "thought leader" styles were good so they have now pushed it much further and wider than ever before.

>They've been doing some of these patterns for a while in certain places.

This exactly. LLMs learned these patterns from somewhere, but they didn't learn them from normal people having casual discussions on sites like Reddit or HN or from regular people's blog posts. So while there is a place where LLM-generated output might fit in, it doesn't in most places where it is being published.

Yeah, even when humans write in this artificial, punched-to-the-max, mic-drop style (as I've seen it described), there's a time and a place.

LLMs default to this style whether it makes sense or not. I don't write like this when chatting with my friends, even when I send them a long message, yet LLMs always default to this style, unless you tell them otherwise.

I think that's the tell. Always this style, always to the max, all the time.

Also with CVs people already use quite limited and establish language, with little variations in professional CVs. I image LLMs can easily replicate that

> people publishing articles that contain these kinds of LLM-ass LLMisms don't mind and don't notice them

That certainly seems to be the case, as demonstrated by the fact that they post them. It is also safe to assume that those who fairly directly use LLM output themselves are not going to be overly bothered by the style being present in posts by others.

> but there are also always clearly real people in the comments who just don't realize that they're responding to a bot

Or perhaps many think they might be responding to someone who has just used an LLM to reword the post. Or translate it from their first language if that is not the common language of the forum in question.

TBH I don't bother (if I don't care enough to make the effort of writing something myself, then I don't care enough to have it written at all) but I try to have a little understanding for those who have problems writing (particularly those not writing in a language they are fluent in).

> Or translate it from their first language if that is not the common language of the forum in question.

While LLM-based translations might have their own specific and recognizable style (I'm not sure), it's distinct from the typical output you get when you just have an LLM write text from scratch. I'm often using LLM translations, and I've never seen it introduce patterns like "it's not x, it's y" when that wasn't in the source.

What is it about this kind of post that you guys are recognizing it as AI from? I don't work with LLMs as a rule, so I'm not familiar with the tells. To me it just reads like a fairly sanitized blog post.

I see this by far the most on Github out of all places.

I am seeing it more and more here as well to be honest.

I called one out here recently with very obvious evidence - clear LLM comments on entirely different posts 35 seconds apart with plenty of hallmarks - but soon got a reply "I'm not a bot, how unfair!". Duh, most of them are approved/generated manually, doesn't mean it wasn't directly copy-pasted from an LLM without even looking at it.