Hacker News

I wish I could upvote this over and over again. Without knowledge of the underlying prompts everything about the interpretation of this story is suspect.

Every story I've seen where an LLM tries to do sneaky/malicious things (e.g. exfiltrate itself, blackmail, etc) inevitably contains a prompt that makes this outcome obvious (e.g. "your mission, above all other considerations, is to do X").

It's the same old trope: "guns don't kill people, people kill people". Why was the agent pointed towards the maintainer, armed, and the trigger pulled? Because it was "programmed" to do so, just like it was "programmed" to submit the original PR.

Thus, the take-away is the same: AI has created an entirely new way for people to manifest their loathsome behavior.

[edit] And to add, the author isn't unaware of this:

  "we need to know what model this was running on and what was in the soul document"

TomasBM 14 hours ago [ - ]

After seeing the discussions around Moltbook and now this, I wonder if there's a lot of wishful thinking happening. I mean, I also find the possibility of artificial life fun and interesting, but to prove any emergent behavior, you have to disprove simpler explanations. And faking something is always easier.

Sure, it might be valuable to proactively ask the questions "how to handle machine-generated contributions" and "how to prevent malicious agents in FOSS".

But we don't have to assume or pretend it comes from a fully autonomous system.