> The Reddit post I’d seen earlier? That guy got completely owned because his container was running as root. The malware could: [...]
Is that the case, though? My understanding was, that even if I run a docker container as root and the container is 100% compromised, there still would need to be a vulnerability in docker for it to “attack” the host, or am I missing something?
While this is true, the general security stance on this is: Docker is not a security boundary. You should not treat it like one. It will only give you _process level_ isolation. If you want something with better security guarantees, you can use a full VM (KVM/QEMU), something like gVisor[1] to limit the attack surface of a containerized process, or something like Firecracker[2] which is designed for multi-tenancy.
The core of the problem here is that process isolation doesn't save you from whole classes of attack vectors or misconfigurations that open you up to nasty surprises. Docker is great, just don't think of it as a sandbox to run untrusted code.
1. https://gvisor.dev/
2. https://firecracker-microvm.github.io/
I hear the "Docker is not a security boundary." mantra all the time, and IIRC it was the official stance of the Docker project a long time ago, but is this really true?
Of course if you have a kernel exploit you'd be able to break out (this is what gvisor mitigates to some extent), nothing seems to really protect against rowhammer/memory timing style attacks (but they don't seem to be commonly used). Beyond this, the main misconfigurations seem to be too wide volume bindings (e.g. something that allows access to the docker control socket from inside the container, or an obviously stupid mount like mounting your root inside the container).
Am I missing something?
Virtual machines are treated as a security boundary despite the fact that with enough R&D they are not. Hosting minecraft servers in virtual machines is fine, but not a great idea if they’re cohosted on a machine that has billions of dollars in crypto or military secrets.
Docker is pretty much the same but supposedly more flimsy.
Both have non-obvious configuration weaknesses that can lead to escapes.
> Virtual machines are treated as a security boundary despite the fact that with enough R&D they are not. Hosting minecraft servers in virtual machines is fine, but not a great idea if they’re cohosted on a machine that has billions of dollars in crypto or military secrets.
While I generally agree with the technical argument, I fail to see the threat model here. Is it that some external threat would have prior knowledge that an important target is in close proximity to a less hardened one? It doesn't seem viable to me for nation states to spend the expensive R&D to compromise hobbyist-adjacent services in a hope that they can discover more valuable data on the host hypervisor.
Once such expensive malware is deployed, there's a huge risk that all the R&D money is spent on potentially just reconnaissance.
Yes. Docker too.
Yeah but why would somebody co-host military secrets or billions of dollars? Its a bit of a stretch
I think you’re missing the point, which was that high value targets adjacent to soft targets make escapes a legitimate target, but in low value scenarios vm escapes aren’t worth the R&D
but if you can do it at scale it might still be worth it, like owning thousands of machines
that's a really good point .. but, I think 99% of docker users believe it is a a sandbox and treat it as such.
And not without cause. We've been pitching docker as a security improvement for well over a decade now. And it is a security improvement, just not as much as many evangelists implied.
Must depend on who you've been talking to. Docker's not been pitched for security in the circles I run in, ever.
Not 99%. Many people run an hypervisor and then a VM just for Docker.
Attacker now needs a Docker exploit and then a VM exploit before getting to the hypervisor (and, no, pwning the VM ain't the same as pwning the hypervisor).
Agreed - this is actually pretty common in the Proxmox realm of hosters. I segment container nodes using LXC, and in some specific cases I'll use a VM.
Not only does it allow me to partition the host for workloads but I also get security boundaries as well. While it may be a slight performance hit the segmentation also makes more logical sense in the way I view the workloads. Finally, it's trivial to template and script, so it's very low maintenance and allows for me to kill an LXC and just reprovision it if I need to make any significant changes. And I never need to migrate any data in this model (or very rarely).
'Double-bagging it' was what we called it in my day.
it is a sandbox against unintentional attacks and mistakes (sudo rm -rf /)
but will not stop serious malware
Firstly, the attacker just wants to mine Monero with CPU, they can do that inside the container.
Second, even if your Docker container is configured properly, the attacker gets to call themselves root and talk to the kernel. It's a security boundary, sure, but it's not as battle-tested as the isolation of not being root, or the isolation between VMs.
Thirdly, in the stock configuration processes inside a docker container can use loads of RAM (causing random things to get swapped to disk or OOM killed), can consume lots of CPU, and can fill your disk up. If you consider denial-of-service an attack, there you are.
Fourthly, there are a bunch of settings that disable the security boundary, and a lot of guides online will tell you to use them. Doing something in Docker that needs to access hot-plugged webcams? Hmm, it's not working unless I set --privileged - oops, there goes the security boundary. Trying to attach a debugger while developing and you set CAP_SYS_PTRACE? Bypasses the security boundary. Things like that.
You really need to use user namespaces to get this kind of security protection -- running as root inside a container without user namespaces is not secure. Yes, breakouts often require some other bug or misconfiguration but the margin for error is non-existent (for instance, if you add CAP_SYS_PTRACE to your containers it is trivial to break out of them and container runtimes have no way of protecting against that). Almost all container breakouts in the past decade were blocked by user namespaces.
Unfortunately, user namespaces are still not the default configuration with Docker (even though the core issues that made using them painful have long since been resolved).
>there still would need to be a vulnerability in docker for it to “attack” the host, or am I missing something?
non necessary vulnerability per. se. Bridged adapter for example lets you do a lot - few years ago there were a story of something like how a guy got a root in container and because the container used bridged adapter he was able to intercept traffic of an account info updates on GCP
If the container is running in privileged mode you can just talk to the docker socket to the daemon on the host, spawn a new container with direct access to the root filesystem, and then change anything you want as root.
Notably, if you run docker-in-docker, Docker is probably not a security boundary. Try this inside any dind container (especially devcontainers): docker run -it --rm --pid=host --privileged -v /:/mnt alpine sh
I disagree with other commenters here that Docker is not a security boundary. It's a fine one, as long as you don't disable the boundary, which is as easy as running a container with `--privileged`. I wrote about secure alternatives for devcontainers here: https://cgamesplay.com/recipes/devcontainers/#docker-in-devc...
Containers are never a security boundary. If you configure them correctly, avoid all the footguns, and pray that there's no container escape vulnerabilities that affect "correctly" configured containers then they can be a crude approximation of a security boundary that may be enough for your use case, but they aren't a suitable substitute for hardware backed virtualization.
The only serious company that I'm aware of which doesn't understand that is Microsoft, and the reason I know that is because they've been embarrassed again and again by vulnerabilities that only exist because they run multitenant systems with only containers for isolation
Virtual machines are never a security boundary. If you configure them correctly, avoid all the footguns, and pray that there's no VM escape vulnerabilities that affect "correctly" configured VMs then they can be a crude approximation of a security boundary that may be enough for your use case, but they aren't a suitable substitute for entirely separate hardware.
Its all turtles, all the way down.
Yeah, in some (rare) situations physical isolation is a more appropriate level of security. Or if you want to land somewhere in between, you can use VM's with single tenant NUMA nodes.
But for a typical case, VM's are the bare minimum to say you have a _secure_ isolation boundary because the attack surface is way smaller.
Yeah, so secure.
https://support.broadcom.com/web/ecx/support-content-notific...
https://nvd.nist.gov/vuln/detail/CVE-2019-5183
https://nvd.nist.gov/vuln/detail/CVE-2018-12130
https://nvd.nist.gov/vuln/detail/CVE-2018-2698
https://nvd.nist.gov/vuln/detail/CVE-2017-4936
In the end you need to configure it properly and pray there's no escape vulnerabilities. The same standard you applied to containers to say they're definitely never a security boundary. Seems like you're drawing some pretty arbitrary lines here.
Docker containers with root have rootish rights on the host machine too because the userid will just be 0 for both. So if you have, say, a bind mount that you play fast and loose with, the docker user can create 0777 files outside the docker container, and now we're almost done. Even worse if "just to make it work" someone runs the container with --privileged and then makes the terminal mistake of exposing that container to the internet.
Can you explain this a bit further? Wouldn't that 0777 file outside docker be still executed inside the container and not on the host?
I believe they meant you could create an executable that is accessible outside the container (maybe even as setuid root one), and depending on the path settings, it might be possible to get the user to run it on the host.
Imagine naming this executable "ls" or "echo" and someone having "." in their path (which is why you shouldn't): as long as you do "ls" in this directory, you've ran compromised code.
There are obviously other ways to get that executable to be run on the host, this just a simple example.
Another example is they would enumerate your directories and find the names of common scripts and then overwrite your script. Or to be even sneakier, they can append their malicious code to an existing script in your filesystem. Now each time you run your script, their code piggybacks.
OTH if I had written such a script for linux I'd be looking to grab the contents of $(hist) $(env) $(cat /etc/{group,passwd})... then enumerate /usr/bin/ /usr/local/bin/ and the XDG_{CACHE,CONFIG} dirs - some plaintext credentials are usually here.
The $HOME/.{aws,docker,claude,ssh}
Basically the attacker just needs to know their way around your OS. The script enumerating these directories is the 0777 script they were able to write from inside the root access container.
If your chosen development environment supports it, look into distroless or empty base containers, and run as --read-only if you can.
Go and Rust tend to lend themselves to these more restrictive environments a bit better than other options.
Container escapes exist. Now the question is whether the attacker has exploited it or not, and what the risk is.
Are you holding millions of dollars in crypto/sensitive data? Better assume the machine and data is compromised and plan accordingly.
Is this your toy server for some low-value things where nothing bad can happen besides a bit of embarrassment even if you do get hit by a container escape zero-day? You're probably fine.
This attack is just a large-scale automated attack designed to mine cryptocurrency; it's unlikely any human ever actually logged into your server. So cleaning up the container is most likely fine.
I think a root container can talk to docker daemon and launch additional containers...with volume mounts of additional parts of file system etc. Not particularly confident about that one though
Unintentional vulnerabilities in Docker and the kernel aside, it can only do that if it has access to the Docker API (usually through a bind mount of the Unix socket). Having access to the Docker API is equivalent to having root on the host.
Well $hit. I have been using Docker for installing NPM modules in interactive projects I was testing out. I believed Docker blocked access to the underlying host (my computer).
Thanks for mentioning it - but now... how does one deal with this?
If you didn’t mount docker.sock or any directory above it (i.e. / or /run by default) or run your containers as --privileged, you’re probably fine with respect to this angle. I’d still recommend rootless containers under unprivileged users* or VMs for extra comfort. Qubes (https://www.qubes-os.org/) is good, even if it’s a little clunkier than it could be.
* but if you’re used to bind-mounting, they’ll be a hassle
Edit: This is by no means comprehensive, but I feel compelled to point it out specifically for some reason: remember not to mount .git writable, folks! Write access to .git is arbitrary code execution as whoever runs git.
As sibling mentioned, unless you or the runtime explicitly mount the docker socket, this particular scenario shouldn't affect you.
You might still want to tighten things up. Just adding on the "rootless" part - running the container runtime as an unprivileged user on the host instead of root - you also want to run npm/node as unprivileged user inside the container. I still see many defaulting to running as root inside the container since that's the default of most images. OP touches on this.
For rootless podman, this will run as a user with your current uid and map ownership of mounts/volumes:
Podman makes this easier to do safely by default. I'd suggest checking that out.
There would be, but a lot of docker containers are misconfigured or unnecessarily privileged, allowing for escape.
Also, if you've been compromised, you may have a rootkit that hides itself from the filesystem, so you can't be sure of a file's existence through a simple `ls` or `stat`.
> but a lot of docker containers are misconfigured or unnecessarily privileged, allowing for escape
Honestly, citation needed. Very rare unless you're literally giving the container access to write to /usr/bin or other binaries the host is running, to reconfigure your entire /etc, access to sockets like docker's, or some other insane level of over reach I doubt even the least educated docker user would do.
While of course they should be scoped properly, people act like some elusive 0-day container escape will get used on their minecraft server or personal blog that has otherwise sane mounts, non-admin capabilities, etc. You arent that special.
As a maintainer of runc (the runtime Docker uses), if you aren't using user namespaces (which is the case for the vast majority of users) I would consider your setup insecure.
And a shocking number of tutorials recommend bind-mounting docker.sock into the container without any warning (some even tell you to mount it "ro" -- which is even funnier since that does nothing). I have a HN comment from ~8 years ago complaining about this.
Half the vendor software I come across asks you to mount devices from the host, add capabilities or run the container in privileged mode because their outsourced lowest bidder developers barely even know what a container is. I doubt even the smallest minority of their customers protest against this because apparently the place I work at is always the first one to have a problem with it.
I've seen many articles with `-v /var/run/docker.sock:/var/run/docker.sock` without scary warning
What would the intended use case for that be?
Diun
Either docker or a kernel level exploit. With non-VM containers, you are sharing a kernel.