It would be nice to publish the exact setup used (workspace dump, OpenClaw version, ...) to be able to reproduce and try out more payloads.

In general I have mixed feelings about this result: sure, opus4.6 is excellent at following user intent and recognise potential prompt injection attempts. But: Is the "security" prompt used realistic for a generic use-case (processing of emails)? I guess not.

In my experiments - without this specific prompt - I was able to derail the user intent to make opus4.8 download and execute a malicious script [0] just by asking "Summarize my new emails".

[0] https://itmeetsot.eu/posts/2026-06-04-openclaw_opus48/

Thanks for sharing your article, very interesting.

I used https://github.com/openclaw/openclaw-ansible and configured a heartbeat (using Openclaw's terms) to check emails every hour. Had to do a bit more to make sure it had new context for every email.

Nice write-up! I saw some earlier posts were submitted here, but not that one - so I tried submitting it:

https://news.ycombinator.com/item?id=48686947

Thanks! I tried to submit the posts but for some reason my submissions are not published in HN any more. I tried to reach out to HN admins but no response so far.