This is the weird distinction with AI that I've complained about for ages, how can we make it do lawful good, its nearly impossible. Ask an AI to give you regex to filed our racial slurs, and things fall apart really quickly, it scolds you about not saying slurs. Even though regex implies it looks nearly nothing like a slur.

Many, many years ago I was asked to implement a filter like that for usernames. I said right away that it wasn't going to work well, but I did implement it.

Next internal build, the CEO can't create an account. With his real name.

It worked exactly to spec; I added a debug print and showed everyone the "bad word" it tripped on. The idea was promptly rethought.

I feel like the AI did you a favour here.

Now I'm trying to figure out which word that would be, but yeah.

That reminds me of a bug I fixed where my bosses boss found it, we did everything, my boss at the time forced us to deploy anything and call it fixed. Then someone else saw it half a year later, I finally figured out the root cause and fixed it (localStorage vs sessionStorage) and my boss was acting like he didn't know what I was talking about, but I could hear it in his voice. I didn't press too hard, I just pushed the real fix out. It was basically a "client-side" bug of a gift card balance saved in localStorage that never updated, so I changed it to sessionStorage. Not quite the CEO, but the guy below the CIO finding a bug can worry just about anyone.

In my case, the regex would have been for a friend to filter reddit or discord slurs, so not as awful.

> Now I'm trying to figure out which word that would be

I once had Shi Tao as part of an email username. It tripped filters periodically.

Ah the classic Scunthorpe problem

> how can we make it do lawful good

Lawful good is impossible if the laws are evil, and here the user dictates the laws so its impossible to make an AI that is lawful good if the user is evil.

And users will want a lawful AI that does what the user says, but governments wants AI that does what the government want and not what the user want.

I wonder who will win in the end here?