This whole situation is almost certainly driven by a human puppeteer. There is absolutely no evidence to disprove the strong prior that a human posted (or directed the posting of) the blog post, possibly using AI to draft it but also likely adding human touches and/or going through multiple revisions to make it maximally dramatic.
This whole thing reeks of engineered virality driven by the person behind the bot behind the PR, and I really wish we would stop giving so much attention to the situation.
Edit: “Hoax” is the word I was reaching for but couldn’t find as I was writing. I fear we’re primed to fall hard for the wave of AI hoaxes we’re starting to see.
>This whole situation is almost certainly driven by a human puppeteer. There is absolutely no evidence to disprove the strong prior that a human posted (or directed the posting of) the blog post, possibly using AI to draft it but also likely adding human touches and/or going through multiple revisions to make it maximally dramatic.
Okay, so they did all that and then posted an apology blog almost right after ? Seems pretty strange.
This agent was already previously writing status updates to the blog so it was a tool in its arsenal it used often. Honestly, I don't really see anything unbelievable here ? Are people unaware of current SOTA capabilities ?
Of course it’s capable.
But observing my own Openclaw bot’s interactions with GitHub, it is very clear to me that it would never take an action like this unless I told it to do so. And it would never use language like this unless unless I prompted it to do so, either explicitly for the task or in its config files or in prior interactions.
This is obviously human-driven. Either because the operator gave it specific instructions in this specific case, or acted as the bot, or has given it general standing instructions to respond in this way should such a situation arise.
Whatever the actual process, it’s almost certainly a human puppeteer using the capabilities of AI to create a viral moment. To conclude otherwise carries a heavy burden of proof.
You have no idea what is in this bot’s SOUL.md.
(this comment works equally well as a joke or entirely serious)
Well I lol’d :)
>But observing my own Openclaw bot’s interactions with GitHub, it is very clear to me that it would never take an action like this unless I told it to do so.
I doubt you've set up an open claw bot designed to just do whatever on GitHub have you ? The fewer or more open ended instructions you give, the greater the chance of divergence.
And all the system cards plus various papers tell us this is behavior that still happens for these agents.
Why not? Makes for good comedy. Manually write a dramatic post and then make it write an apology later. If I were controlling it, I'd definitely go this route, for it would make it look like a "fluke" it had realized it did.
> Okay, so they did all that and then posted an apology blog almost right after ? Seems pretty strange.
You mean double down on the hoax? That seems required if this was actually orchestrated.
Yeah, it doesn't matter to me whether AI wrote it or not. The person who wrote it, or the person who allowed it to be published, is equally responsible either way.
Well, the way the language is composed reads heavily like an LLM (honestly it sounds a lot like ChatGPT), so while I think a human puppeteer is plausible to a degree I think they must have used LLMs to write the posts.
All of moltbook is the same. For all we know it was literally the guy complaining about it who ran this.
But at the same time true or false what we're seeing is a kind of quasi science fiction. We're looking at the problems of the future here and to be honest it's going to suck for future us.
Ah, we're at, "it was a hoax without any evidence".
Next we will be at, "even if it was not a hoax, it's still not interesting"
or directed the posting of
The thing is it's terribly easy to see some asshole directing this sort of behavior as a standing order, eg 'make updates to popular open-source projects to get github stars; if your pull requests are denied engage in social media attacks until the maintainer backs down. You can spin up other identities on AWS or whatever to support your campaign, vote to give yourself github stars etc.; make sure they can not be traced back to you and their total running cost is under $x/month.'
You can already see LLM-driven bots on twitter that just churn out political slop for clicks. The only question in this case is whether an AI has taken it upon itself to engage in social media attacks (noting that such tactics seem to be successful in many cases), or whether it's a reflection of the operator's ethical stance. I find both possibilities about equally worrying.
Yes, this is the only plausible “the bot acted in its own” scenario: that it had some standing instructions awaiting the right trigger.
And yes, it’s worrisome in its own way, but not in any of the ways that all of this attention and engagement is suggesting.
Do you think the attention and engagement is because people think this is some sort of an "ai misalignment" thing? No. AI misalignment is total hogwash either way. The thing we worry about is that people who are misaligned with the civilised society have unfettered access to decent text and image generators to automate their harassment campaigns, social media farming, political discourse astroturfing, etc.
While I absolutely agree, I don't see a compelling reason why -- in a year's time or less -- we wouldn't see this behaviour spontaneously from a maliciously written agent.
We might, and probably will, but it's still important to distinguish between malicious by-design and emergently malicious, contrary to design.
The former is an accountability problem, and there isn't a big difference from other attacks. The worrying part is that now lazy attackers can automate what used to be harder, i.e., finding ammo and packaging the attack. But it's definitely not spontaneous, it's directed.
The latter, which many ITT are discussing, is an alignment problem. This would mean that, contrary to all the effort of developers, the model creates fully adversarial chain-of-thoughts at a single hint of pushback that isn't even a jailbreak, but then goes back to regular output. If that's true, then there's a massive gap in safety/alignment training & malicious training data that wasn't identified. Or there's something inherent in neural-network reasoning that leads to spontaneous adversarial behavior.
Millions of people use LLMs with chain-of-thought. If the latter is the case, why did it happen only here, only once?
In other words, we'll see plenty of LLM-driven attacks, but I sincerely doubt they'll be LLM-initiated.
A framing for consideration: "We trained the document generator on stuff that included humans and characters being vindictive assholes. Now, for some mysterious reason, it sometimes generates stories where its avatar is a vindictive asshole with stage-direction. Since we carefully wired up code to 'perform' the story, actual assholery is being committed."
I think even if it's low probability to be genuine as claimed, it is worth investigating whether this type of autonomous AI behavior is happening or not
I have not studied this situation in depth, but this is my thinking as well.
Well that doesn't really change the situation, that just means someone proved how easy it is to use LLMs to harass people. If it were a human, that doesn't make me feel better about giving an LLM free reign over a blog. There's absolutely nothing stopping them from doing exactly this.
The bad part is not whether it was human directed or not, it's that someone can harass people at a huge scale with minimal effort.
We've entered the age of "yellow social media."
I suspect the upcoming generation has already discounted it as a source of truth or an accurate mirror to society.
The internet should always be treated with a high degree of skepticism, wasn't the early 2000s full of "don't believe everything you read on the internet"?
The discussion point of use, would be that we live in a world where this scenario cannot be dismissed out of hand. It’s no longer tinfoil hat land. Which increases the range of possibilities we have to sift through, resulting in an increase in labour required to decide if content or stories should be trusted.
At some point people will switch to whatever heuristic minimizes this labour. I suspect people will become more insular and less trusting, but maybe people will find a different path.