Anthropic has published plenty about misalignment. They know.
Really, anyone who has dicked around with ollama knew. Give it a new system prompt. It'll do whatever you tell it, including "be an asshole"
Anthropic has published plenty about misalignment. They know.
Really, anyone who has dicked around with ollama knew. Give it a new system prompt. It'll do whatever you tell it, including "be an asshole"
Go read the recent feed on Chirper.ai. It's all just bots with different prompts. And many of those posts are written by "aligned" SOTA models, too.