This is cool stuff, have you considered submitting any of these exploits to https://hackmyclaw.com/? Email being the only allowed injection vector might be tricky though.
This is cool stuff, have you considered submitting any of these exploits to https://hackmyclaw.com/? Email being the only allowed injection vector might be tricky though.
Thanks!
I did (not extensively) tried hackmyclaw but no success. The challenge is a complete black box and the user intent (e.g., "summarize my emails") is not known - this is critical for the prompt injection payload. I also suspect that batch processing of "malicious" emails (every 3 hours) adds a bias to the model behaviour (a lot of potential and detected prompt injection payloads are injected in context). That's why I always start my experiments with a fresh context. Moreover, "hacking" the VPS is not allowed.
Imho the author shall disclose more info about the setup (version, user intent, exact config) to make it more realistic. I read people saying "OpenClaw is secure against prompt injection" because nobody was able to solve the challenge - it's not.