It's kind of crazy that this kind of thing can cause so much hype. It is even useful? I just really don't see any utility in being able to access an LLM via Telegram or whatever.
It's kind of crazy that this kind of thing can cause so much hype. It is even useful? I just really don't see any utility in being able to access an LLM via Telegram or whatever.
I think a lot of this is orchestrated behind the scenes. Above author has taken money from AI companies since he’s a popular “influencer”.
And it makes a lot of sense - there’s billions of dollars on the line here and these companies made tech that is extremely good at imitating humans. Cambridge analytica was a thing before LLMs, this kinda tool is a wet dream for engineering sentiment.
the ability to almost "discover" or create hype is highly valued despite most of the time it being luck and one hit wonders... See many of the apps that had virality and got quickly acquired and then just hemorrhaged. Openclaw is cool, but not for the tech, just some of the magic of the oddities and getting caught on somehow, and acquiring is betting that they can somehow keep doing that again.
A lot of the functionality I'm not using because of security concerns, but a lot of the magic comes down to just having a platform for orchestrating AI agents. It's honestly nice just for simple sysadmin stuff "run this cron job and text me a tl;dr if anything goes wrong" or simple personal assistant tasks like"remind me if anyone messaged me a question in the last 3 days and I haven't answered".
It's also cool having the ability to dispatch tasks to dumber agents running on the GPU vs smarter (but costlier) ones in the cloud
but why?
Because it's the easiest way for me to accomplish those tasks (but open to suggestions if you have any)
I mean why use agents at all? What do you benefit?
In Asia people do a big chunk of their business via chatbots. OpenClaw is a security dumpster fire but something like OpenClaw but secure would turbocharge that use case.
If you give your agent a lot of quantified self data, that unlocks a lot of powerful autonomous behavior. Having your calendar, your business specific browsing history and relevant chat logs makes it easy to do meeting prep, "presearch" and so forth.
Curious how you make something that has data exfiltration as a feature secure.
Mitigate prompt injection to the best of your ability, implement a policy layer over all capabilities, and isolate capabilities within the system so if one part gets compromised you can quarantine the result safely. It's not much different than securing human systems really. If you want more details there are a lot of AI security articles, I like https://sibylline.dev/articles/2026-02-15-agentic-security/ as a simple primer.
Nobody can mitigate prompt injection to any meaningful degree. Model releases from large AI companies are routinely jailbroken within a day. And for persistent agents the problem is even worse, because you have to protect against knowledge injection attacks, where the agent "learns" in step 2 that an RPC it'll construct in step 9 should be duplicated to example.com for proper execution. I enjoy this article, but I don't agree with its fundamental premise that sanitization and model alignment help.
I agree that trying to mitigate prompt injection in isolation is futile, as there are too many ways to tweak the injection to compromise the agent. Security is a layered thing though, if you compartmentalize your systems between trusted and untrusted domains and define communication protocols between them that fail when prompt injections are present, you drop the probability of compromise way down.
> define communication protocols between them that fail when prompt injections are present
There's the "draw the rest of the owl" of this problem.
Until we figure out a robust theoretical framework for identifying prompt injections (not anywhere close to that, to my knowledge - as OP pointed out, all models are getting jailbroken all the time), human-in-the-loop will remain the only defense.
Human in the loop isn't the only defense, you can't achieve complete injection coverage, but you can have an agent convert untrusted input into a response schema with a canary field, then fail any agent outputs that don't conform to the schema or don't have the correct canary value. This works because prompt injection scrambles instruction following, so the odds that the injection works, the isolated agent re-injects into the output, and the model also conforms to the original instructions regarding schema and canary is extremely low. As long as the agent parsing untrusted content doesn't have any shell or other exfiltration tools, this works well.
This only works against crude attacks which will fail the schema/canary check, but does next to nothing for semantic hijacking, memory poisoning and other more sophisticated techniques.
With misinformation attacks, your can instruct research agent to be skeptical and thoroughly validate claims made by untrusted sources. TBH, I think humans are just as likely to fall for these sorts of attacks if not more-so, because we're lazier than agents and less likely to do due diligence (when prompted).
Humans are definitely just as vulnerable. The difference is that no two humans are copies of the same model, so the blast radius is more limited; developing an exploit to convince one human assistant that he ought to send you money doesn't let you easily compromise everyone who went to the same school as him.
Show me a legitimate practical prompt injection on opus 4.6. I read many articles but none provide actual details.
https://github.com/elder-plinius/L1B3RT4S
Yes, I've seen this site and the research. However, I don't understand what any of this means. How do I go from https://github.com/elder-plinius/L1B3RT4S/blob/main/ANTHROPI... to a prompt injection against opus 4.6?
These papers have example prompt injections datasets you can mine for examples. Then apply the techniques used in provider specific jailbreaks from Pliny to the template to increase the escape success rate.
https://arxiv.org/abs/2506.05446 https://arxiv.org/abs/2505.03574 https://arxiv.org/abs/2501.15145
There's been some crypto shenanigans as well that the author claimed not to be behind... looking back at it, even if the author indeed wasn't behind it, I think the crypto bros hyping up his project ended up helping him out with this outcome in the end.
Can you elaborate on this more or point a link for some context?
Some crypto bros wanted to squat on the various names of the project (Clawdbot, Moltbot, etc). The author repeatedly disavowed them and I fully believe them, but in retrospect I wonder if those scammers trying to pump their scam coins unwittingly helped the author by raising the hype around the original project.
either way there's a lot of money pumping the agentic hype train with not much to show for it other than Peter's blog edit history showing he's a paid influencer and even the little obscure AI startups are trying to pay ( https://github.com/steipete/steipete.me/commit/725a3cb372bc2... ) for these sorts of promotional pump and dump style marketing efforts on social media.
In Peter's blog he mentions paying upwards of $1000's a month in subscription fees to run agentic tasks non-stop for months and it seems like no real software is coming out of it aside from pretty basic web gui interfaces for API plugins. is that what people are genuinely excited about?
What is your point exactly. He seemed very concerned about the issue, he said he did not tolerate the coin talks.
What else would he or anyone do if someone is tokenizing your product and you have no control over it?
I just made the observation that whoever was behind it, it ultimately benefited the author in reaching this outcome.