One question I've always had about these capability systems is: why isn't there a way to set capabilities from the parent process when execing? Why trust a program to set its own capabilities? I know that having a process set capabilities on itself doesn't break existing tools, but it seems like if you really wanted a robust system it would make sense to have the parent process, the user's shell for example, set the capabilities on its children, and have those capabilities be inheritable so the child could spawn other processes with the same or fewer capabilities (if it's allowed to do that at all). Is there an existing system that works this way, in or outside of the UNIX family? Or maybe some research paper written on the subject? I'd love to know.

You may be interested in OpenBSD's pledge[1][2][3].

> Why trust a program to set its own capabilities?

An example may be that a program starts needing a wide range of capabilties but can then ratchet down to a reduced set once running, aka "privdrop".

> why isn't there a way to set capabilities from the parent process when execing?

There have been replies on other systems so just to stick with pledge which provides the abiliy to set "execpromises" to do this.

[1] https://man.openbsd.org/pledge

[2] https://www.openbsd.org/papers/eurobsdcon2017-pledge.pdf

[3] https://www.openbsd.org/papers/BeckPledgeUnveilBSDCan2018.pd...

I think you're talking about "execpromises"?[1] I'll have to study it a bit.

[1] https://bsdb0y.github.io/posts/openbsd-intro-to-update-on-pl...

I've only really messed with capsicum. You can certainly cap_enter between fork and exec, but depending on exactly what your target does, it's really not simple to do anything meaningful beyond the basic capsicum mode without changes to the program.

The way capabilities usually work is you more or less turn off the usual do whatever you want syscalls, and have to do restricted things through FDs that have the capability to do them. So like, no more open any path, you have to use openat with a FD in your directory of interest. But that requires the program to understand how to use the capabilities and how to be passed them. It's not something that you can just impose.

My understanding of SELinux, is it can be imposed on a program without the knowledge of the program, because it's more or less matching rules for syscalls... rather than giving a restricted FD to use with openat, you restrict the options for open.

This is essentially what containers are. Bubblewrap / Docker / Podman. I think the primary issue is very few applications on Desktop systems are actually designed with sandboxing in mind unlike say something on a phone.

I'm not terrible familiar with Linux container systems, cgroups and all that, but I have been down the rabbit-hole with FreeBSD's jails, and I definitely wouldn't call them a capabilities system. You can lock down the environment quite a bit, and limit or even virtualize the network stack, but you can't say, "Here process, have your standard IO streams and nothing more. Go forth and compute." The process isn't blind to it's environment. You're still in the same basic UNIX user security model. It's really somewhere between chroot and full virtualization.

A default container seccomp profile will let you do quite a few things but you can use a different profile some json and limit to just a few system calls if you want such as doing IO on open FDs without the ability to open them. I think the runtime opens the FDs before the child process starts and are inherited.

Answering without reading TFA here. But I am familiar with capsicum.

But I am pretty sure you CAN get your capabilities from a patent process using capsicum, since they are just file descriptors.