Are prompt injections solved? If OpenClaw is only useful when it has access to your digital life, then why does it matter where it runs? You might as well be asking me to keep my dead man's switch safely on the moon. If you find this software useful, you are sharing a count down to a no good very bad day with everyone else who finds it useful. One zero day prompt injection technique, your e-mail on a distribution list, and that's all she wrote.

The way I solved this was that my open claw doesn't interact directly with any of my personal data (calendar, gmail, etc).

I essentially have a separate process that syncs my gmail, with gmail body contents encrypted using a key my openclaw doesn't have trivial access to. I then have another process that reads each email from sqlite db, and runs gemini 2 flash lite against it, with some anti-prompt injection prompt + structured data extraction (JSON in a specific format).

My claw can only read the sanitized structured data extraction (which is pretty verbose and can contain passages from the original email).

The primary attack vector is an attacker crafting an "inception" prompt injection. Where they're able to get a prompt injection through the flash lite sanitization and JSON output in such a way that it also prompt injects my claw.

Still a non-zero risk, but mostly mitigates naive prompt injection attacks.

I've never used OpenClaw but as I understand it, it has a way of keeping a pseudo memory for context? That alone would be interesting, even if it was only allowed to read the generic internet. Like having a little robot buddy that remembers you and past conversations. Maybe you could have it give you reminders and stuff like you'd do with Alexa?

It basically writes a bunch of notes as markdown files and then injects them as part of its prompts. I saw someone compare it to that movie Momento, where the protagonist can’t form new memories so he tattoos notes all over his body.

That sounds like a good comparison.

It's a bit like the xkcd where the admin account is secure but all the useful information is in the user account anyway.

https://xkcd.com/1200/

It's not a soluble problem, at least not completely. The big frontier models are better at resisting prompt injection, but any LLM is vulnerable to some degree. If you give it access to arbitrary inputs like the web and to your personal data, there's a risk it'll disclose stuff you don't want it to.

It's annoying, because I love OpenClaw as an idea, but I don't trust it enough to give it what it needs to be useful.

IDGI. It is reading emails, which is a vector for prompt injection. It is also reading emails, which is where all password resets are sent to. Anyone granting even read access to their primary email is playing with fire.

I personally don't see how the daily briefings or whatever are worth the risk.

Your AI should not be in a position to submit a resignation email or send a text to your partner asking for a divorce.

As long as the email or text includes the disclaimer "generated with the assistance of artificial intelligence" then you should be fine.

It is a charming solution that addresses the optics with great efficiency while leaving the rot entirely undisturbed. By all means, let us proceed if the goal is to feel busy. But when the inevitable occurs, please ensure you have a second, more serious suggestion ready.

Disclaimer: generated with the assistance of artificial intelligence

Hi $wife,

You're absolutely right. Let's divorce.

I've emailed every lawyer in the state.

--

Sent from my OpenClaw

You forgot the /s... at least I hope you did.

but if that disclaimer means that you have to verify whether or not the "sender" agrees with the content that defeats its purpose, no? if we are all going to be like "did you mean to send this text/email...?"

I'm surprised people don't use Lima (quick headless local VMs where you can mount a folder). [1]

[1] - https://lima-vm.io/docs/examples/ai/

What's the difference between lima and vagrant?

According to _looking it up_ Lima is tailored to macOS. I encourage to look it up yourself!

I'd add using Discord as your chat channel to limit access to your contacts, and isolating access to personal data via mcp servers https://www.appsoftware.com/blog/openclaw-running-a-secure-c...

It's really not that hard to run them in docker. Can give them a nestybox (with a little work) sidecar so they can run docker-in-docker. As far as permissions, the only mental model that makes sense to me is treating them like actual people. Bound their permissions in the other systems not on their own machines, basically zero trust. For instance for email, most mail apps have had delegated permissions for a while, executives use it to have their assistants read and write their mail. That's what is needed with these too.

You still have to trust your executive assistant. I would never give someone I don't trust the ability to read and write emails for me.

If this takes off, I wonder if platforms will start providing API tokens scoped for assistants. They have permissions for non destructive actions like reading mails, flagging important mails, creating drafts, moving to trash, but not more.

How does my email platform know which messages I want my agent to see and which are too sensitive?

I don't see how it's possible to securely give an agent access to your inbox unless it has zero ability to exfiltrate (not sending mail, not making any external network requests). Even then, you need to be careful with artifacts generated by the agent because a markdown file could transmit data when rendered.

The recent releases of OpenClaw have made running it on docker/podman much easier. I've been running it on a stand alone Lenovo Thinkcentre running inside docker. For my needs the setup works well. There are some limitations like hardware and filesystem access with my workstation (macbook) but largely solvable and I like the isolation. For locking it down further, particularly on the network level someone recently released https://nono.sh/ which seems promising. I've been using https://clawchat.dev/ on my macbook for chatting with the openclaw agent. It is rough around the edges but gets the job done.

Put it in a box and then give it read write access to all your valuable data. That'll do it....

Because the VM isn't there to protect your data, it's to give the AI a space where it can do things that would be annoying or cause breakages on your own machine. It also gives you an easy save/restore mechanism.

I recently installed Zeroclaw instead of OpenClaw on a new VPS(It seems a little safer). It wasn’t as straightforward as OpenClaw, but it was easy to setup. I added skills that call endpoints and also cron jobs to trigger recurrent skills. The endpoints are hosted on a separate VPS running FastAPI (Hetzner, ~$12/month for two vps).

I’m assuming the claw might eventually be compromised. If that happens, the damage is limited: they could steal the GLM coding API key (which has a fixed monthly cost, so no risk of huge bills), spam the endpoints (which are rate-limited), or access a Telegram bot I use specifically for this project

I think this might be exaggerated, but some possibly relevant humor: https://use-a-vps.exe.xyz/

instead of me doing 'pip install skypilot' in a terminal, why doesn't skypilot make a skypilot smartphone app that will provision the cloud resource? then could even get rid of the whatsapp/telegram dependency by making the app a messaging client (to communicate with the openclaw server)

It's hilarious watching people discover security again. Everyone plugging their favorite sandbox technology. Yes, sand boxing processes is a thing that has existed for a long time and there are a million tools that do it. Systemd has it built in for example. Even claude code itself has sandboxing and permissions built in.

Process isolation is not the danger with OpenClaw. Giving an LLM access to all your shit is the problem. My solution is to treat it like a human, give it it's own accounts, scoped to what you want it to do and accept the risks associated with that. If I had a human assistant I wanted to read my email, I'd set up an inbox for them specifically and forward what I want them to screen. I don't use OpenClaw, but have a similar harness I built that runs as an unprivileged Linux user with access to just what I want it to access.

I know it's not in vogue to actually know how technology works anymore, but we have literally decades worth of technology solutions for authentication/authorization, just fucking use it.

[deleted]

To be honest, anyone with a Claude Code subscription can just write their own in moments. My own assistant has its own email address and Apple ID and interacts primarily via a Telegram bot. I share my calendar with it and my email syncs down and is indexed, but it sends email via its own Gmail account.

The interesting part about OpenClaw is that if you give a world-class model an arbitrary number of skills then emergent behavior mimicking intelligent assistance appears. The structural pieces of that are just long-term memory, an agentic loop, a messaging system, and self-modification.

You can get something quite functional out of:

* A memory.md

* A hand-rolled agent loop (this is just "keep calling till num tries exhausted or agent says stop") - claude knows how to write openai function call syntax and codex tool call syntax

* A Telegram bot

* Access to a persistent filesystem for it to build itself skills

It can be quite expensive to run, but a trick that is supported[0] is to use a Codex subscription by getting a codex cli token and using that. OpenAI explicitly supports this, so you can just use it.

You can try to make improvements to this structure in all sorts of ways using all sorts of tools and get somewhere but this much is all you need. You really have to just give yourself 2 hours with Claude Code and a similar prompt to get somewhere. This is the first time in history that personal software has been this accessible to everyone.

0: someone here told me about it https://news.ycombinator.com/item?id=47151310

I think nanoclaw is actually designed to be run that way.

Firejail seems like the right tool for a somewhat complicated desktop application that you want isolation for, that's not simple to containerize.

most people want openclaw to access their personal files, thats the big use case.

Can't these claws build their own personalities, and along with it their own personal files?

The claw community is clearly not thinking big enough.

serious question why anyone on hn would run this?

Both replies to your question give you the two sides. It is a scary, stupid thing to give your house keys to, but it is also very interesting like two trains crashing.

Maybe a middle ground would be isolating it like the article suggests, and poking it with a stick (giving it limited, or newly created accounts) to see what it can do?

Right? It's asking for trouble.

I was in the repebble comments a few days ago and this person rolled their own for very obvious reasons: https://news.ycombinator.com/item?id=47078454

For me at least its an interesting project I can take apart and build on top of. I've built 100% my own agent frameworks from scratch and have learned a lot from them. There is something to be said on learning from others projects as well, also because its an ever evolving project with so many contributes whatever fork you go with of your own, theirs a good chance the new goodies will work with your own modified version. For example I'm looking in to LCM right now, and woo-dent you know it someone ported it to openclaw. But nanobot doesn't have it, so I'm considering working on the LCM port to that. If i succeed i will learn a lot and also contribute to progress in my own little ways.

Guys, remember, when you set up your AI-controlled automatic machine gun in your front lawn, be sure to do it safely and pour a solid concrete foundation for it to sit atop of. We wouldn't want it to cause harm or injury by tipping over.

this is why we can't have nice things ...

[dead]

We've been seeing a lot of people run OpenClaw directly on their main machine, which is a bad idea for a few reasons: it needs broad system access, it's noisy on resources, and if something goes wrong you want a clean blast radius. The obvious answer is "just isolate it," but isolation has real friction. You need to provision a machine, handle SSH keys, configure security groups, and remember to tear things down so you're not leaking money. This post walks through the three realistic options:

Docker – lowest friction, but shares your kernel and has limits depending on what OpenClaw needs to do Dedicated hardware – best isolation, but you're paying 24/7 and it takes time to set up Cloud VM – the sweet spot for most people: true isolation, pay-per-use, tear it down when you're done

For the cloud VM path, we show how to launch a hardened OpenClaw environment on AWS, GCP, Azure, or any other cloud with a single command, handling provisioning, SSH, and auto-teardown for you.

It seems to be perfectly happy to run on virtual box with a Debian install. The host pc is running a local model. I’m quite impressed with what it’s capable of.

That’s only half of the problem.

People give OpenClaw access to their online services like mails where it can also do damage.

A hardened environment doesn’t prevent those kind of damage

As people have pointed out in other threads, you don't even need access to these services to cause problems. As long as the AI can send any bytes out, it can leak information. Like you may think of an HTTP GET as read-only, but you can pack any data you want into the URL or headers.

In the end it will all be about separation of duty between agents in a larger team and isolating the ones that need more access to your private stuff.

Wardgate acts like a drop in replacement for curl with full access control at the url / method / content level, so you can allow specific curl access to specific APIs but prevent all other outbound connections. That's what I use for my PA agent. She's very limited and can't access the open internet. Doesn't need it either

You can also stuff data into a GET request body, I've seen some devs do it and I related my disapproval about it.

There's no hardening against idiocy.

It does, of you use WardGate [1] and only allow read and archive access and only delete access on your inbox but nothing else for instance.

1 https://github.com/wardgate/wardgate