The article you are responding to showed that a strange LLM behaviour was caused by a training signal that was explicitly designed to produce that type of behaviour. They were able to isolate it, clearly demonstrate what happened, and roll out a mitigation using a mechanism they engineered for exactly this type of thing (the developer prompt). That doesn’t sound like sorcery to me. If anything I’m surprised you can so easily engineer these things!

The article I am responding to (which I've read) shows that these LLMs come with all sorts of hacks (= context bits) to make it behave more like this or more like that.

There is probably a whole testing workflow at AI companies to tweak each new model until it "looks" acceptable.

But they still don't understand what they are doing. This is purely empirical.

It's interesting to think about what the process will look like when we do understand them. I imagine pulling bits of LLM off the shelf like libraries and compiling them together into a functioning "brain", precisely tailored to your needs.

That all of their model outputs should be influenced by whatever personality prompt voodoo the wise artisan at OpenAI decided to stuff it with during RL should give everyone pause.

That Nerdy personality prompt made me gag. As a card-carrying Nerd, I feel offended

I configured it to use the nerdy personality when I used it to help me on a personal project (setting up a home server, nothing too fancy). LLMs are great at parsing documentation and combing through forums to find out the configurations that matched my goals.

The first time it said something along the lines of "let's use these options to avoid future gremlins haunting you", I sort of rolled my eyes but it was okay, I thought its attempt to sound endearing almost cute. A bit of a "hello fellow kids" attempt at sounding nerdy.

It quickly became noise though. It was extremely overused. Sometimes multiple mentions to goblins in the same reply.

I don't really have an opinion about it, but I sort of came to prefer a more neutral tone instead.

…months after it began.