Why have the LLMs „learned“ to write PRs (and other stuff) this way? This style was definitely not mainstream on Github (or Reddit) pre-LLMs, was it?

It’s strange how AI style is so easy to spot. If LLMs just follow the style that they encountered most frequently during training, wouldn’t that mean that their style would be especially hard to spot?

For this "LLM style were already the most popular, that's how LLM works, then how come LLM style is so weird and annoying" I have 2 theories.

First, LLM style did not even exist, it's a match of several different styles, choice of words and phrases.

Second, LLM has turned a slight plurality into a 100% exclusivity.

Say, there are 20 different choices to say the same thing. They are more or less evenly distributed, one of them is a slightly more common. LLM chooses the most common one. This means that

   situation before : 20 options,  5% frequency each
   situation now    :  1 option, 100% frequency
LLM text is both reducing the variety and increases the absolute frequency drastically.

I think these 2 theories explain how can LLM both sound bad, and "be the most common stye, how humans have always talked" (it isn't).

Also, if the second theory is true, that is, LLM style is not very frequent among humans, that means that if you see someone on the internet that talks like an LLM, he probably is one.

I understand there is an "Exclude Top Choices" algorithm which helps combat this sort of thing.

This is total speculation, but my guess is that human reviewers of AI-written text (whether code or natural language) are more likely to think that the text with emoji check marks, or dart-targets, or whatever, are correct. (My understanding is that many of these models are fine-tuned using humans who manually review their outputs.) In other words, LLMs were inadvertently trained to seem correct, and a little message that says "Boom! Task complete! How else may I help?" subconsciously leads you to think it's correct.

My guess is they were trained on other text from other contexts (e.g. ones where people actually use emojis naturally) and it transferred into the PR context, somehow.

Or someone made a call that emoji-infested text is "friendlier" and tuned the model to be "friendlier."

Maybe the humans in the loop were all MBAs who believe documents and powerpoint slides look more professional when you use graphical bullet points.

(I once got that feedback from someone in management when writing a proposal...)

I suspect that this happens to be desired by the segment most enamored with LLMs today, and the two are co-evolving. I’ve seen discussions about how LM arena benchmarks might be nudging models in this direction.

AI sounds weird because most of the human reviewers are ESL.

You may thank millenial hipsters who used think emojis are cute and proliferation of little javascript libraries authored by them on your friendly neighborhood githubs.

Later the cutest of the emojis paved their way into templates used by bots and tools, and it exploded like colorful vomit confetti all over the internets.

When I see this emojiful text, my first association is not with an LLM, but with a lumberjack-bearded hipster wearing thick-framed fake glasses and tight garish clothes, rolling on a segway or an equivalent machine while sipping a soy latte.

Everyone in this thread is now dumber for having read this comment. I award you no points and may god have mercy on your soul.

Jokes on GP, I give up reading most comments when I don't like them anymore, usually after 1-2 sentences.

I love how these elaborate stereotypes reveal more about the author than the group of people they are lampooning.

Welcome to the bottom, it's warm and cozy down here.

This generic comment reads like its AI generated, ironically

It’s below me to use LLMs to comment on HN.

Exactly what an LLM would say.

Jk, your comments don't seem at all to me like AI. I don't see how that could even be suggested

[flagged]

Beard: check

Glasses: check (I'm old)

Garish clothes: check

Segway: nope

So there's a 75% chance I am a Millenial hipster. Soy latte: sounds kinda nice

LLMs write things in a certain style because that's how the base models are fine tuned before being given to the public.

It's not because they can't write PRs indistinguishable from humans, or can't write code without Emojis. It's because they don't want to freak out the general public so they have essentially poisoned the models to stave off regulation a little bit longer.

I doubt this. I've done AI annotation work on the big models. Part of my job was comparing two model outputs and rating which is better, and using detailed criteria to explain why it's better. The HF part.

That's a lot of expensive work they're doing, and ignoring, if they're just later poisoning the models!

GP kind of implying that AGI is already there, and all companies are just dumbing them down because of regulations of the law.

I'm like "Sure buddy, sure. And the nanobots are in all vaccines, right?"

this is WILD speculation without a citation. it would be a fascinating comment if you had one! but without? sounds like bullshit to me...

It is wildly speculative, but it's something I've never considered. If I were making a brave new technology that I knew had power for unprecedented evil, I might gimp it, too.

This sounds like the most plausible explanation to me. Occam's razor, remember it!

My impression is that this style started with apple products. I remember distinctly opening a terminal and many command lines (mostly Javascript frameworks) applications were showing emoji in the terminal way before LLMs.

But maybe it originated somewhere else.. In Javascript libraries..?

I thought it was JavaScript libraries written by people obsessed with the word "awesome", and separately the broader inclusivity movement. For some reason, I think people think riddling a README with emoji makes the document more inclusive.

> For some reason, I think people think riddling a README with emoji makes the document more inclusive.

Why do you think that? I try to stay involved in accessibility community (if that's what you mean by inclusive?) and I've not heard anyone advocate for emojis over text?

It's really only anecdotal — I observed this as a popular meme between ~2015-2020.

I say "meme" because I believe this is how the information spreads — I think people in that particular clique suggest it to each other and it becomes a form of in-group signalling rather than an earnest attempt to improve the accessibility of information.

I'm wary now of straying into argumentum ad ignorantiam territory, but I think my observation is consistent with yours insofar as the "inclusivity" community I'm referring to doesn't have much overlap with the accessibility community; the latter being more an applied science project, and the former being more about humanities and social theory.

Could you give an example of the inclusivity community? I'm not sure I understand.

I mean the diversity and inclusion world — people focused on social equity and representation rather than technical usability. Their work is more rooted in social theory and ethics than in empirical research.

I do remember 1 example of an emoji in tech docs before all of this -- learning github actions (which based on my blog happened in 2021 for me, before ChatGPT release), at one point they had an apple emoji at the final stage saying "done". (I am sure there are others, I just do not remember them.)

But agree excessive emoji's, tables of things, and just being overly verbose are tells for me anymore.

I do recall emoji use getting more popular in docs and – brrh – in the outputs of CLI programs already before LLMs. I’m pretty sure thst the trend originated from the JS ecosystem.

It absolutely was a trend right before LLM training started — but no way this was already the style of the majority of all tech docs and PRs ever.

The „average“ style, from the Unix manpages from the 1960s through the Linux Documentation Project all the way to the latest super-hip JavaScript isEven emoji vomit README must still have been relatively tame I assume.

Really hate this trend/style. Sucks that it's ossified into many AIs. Always makes me think of young preteens who just started texting/DMing. Grow up!

I wonder if there's an analogy to the style of Nigerian e-mail scams, that always contain spelling errors, and conclude with "God Bless." If the writing looks too literate, people might actually read and critique it.

God Bless.

I wonder if it's due to emojis being able to express a large amount of infomation per token. For instance, the bulls-eye emoji is 16 bits. Also, Emoji's don't have the language barrier.

RLHF and system prompt, I assume. But isn't being able to identify LLM output a good thing?

There's some research that shows that LLMs finetuned to write malicious code (with security vulnerabilities) also becomes more malicious (including claiming that Hitler is a role model).

So it's entirely possible that training in one area (eg: Reddit discourse) might influence other areas (such as PRs)

https://arxiv.org/html/2502.17424v1

It reminds me of this, but without the logic and structure: https://gitmoji.dev/

Don't Github have emoji reactions? I would assume that those tie "PR" and "needs emojis" closely together.

I'm glad that AI slop is detectable. So, for now the repulsive emoji crap is a useful heuristic to me that someone is wasting my time. In a few years once it is harder to detect I expect I'm going to have a harder and more frustrating time. For this reason I hope people don't start altering their prompts to make them harder to detect as LLM generated to people with a modicum of intelligence left.

> Why have the LLMs „learned“ to write PRs (and other stuff) this way?

They didn't learn how to write PRs. They "learned" how to write text.

Just like generic images coming out of OpenAI have the same style and yellow tint, so does text. It averages down to a basic tiktok/threads/whatever comment.

Plus whatever bias training sets and methodology introduced

That’s my whole point: Why does it seemingly „average down“ to a style that was not encountered „on average“ at the time that LLM training started?

[deleted]
[deleted]