A problem with that approach is that libc can after an upgrade decide to start doing syscalls you were not expecting. Like the first time you call `printf()` it calls `newfstatat()`. Only the first time. Maybe in the future it'll call it more often than that, and then your binary breaks.
I'm not sure what glibc's latest policy is on linking statically, but at least it used to be basically unsupported and bugs about it were ignored. But even if supported, you can't know if it under some configurations or runtime circumstances uses dlopen for something.
Or maybe once you juggle more than X file descriptors some code switches from using `poll()` to using `select()` (or `epoll()`).
My thoughts last time I looked at seccomp: https://blog.habets.se/2022/03/seccomp-unsafe-at-any-speed.h...
This is a problem but fwiw libc's should be falling back to old system calls. You can block clone3 today and see that your libc will fall back to clone.
> A problem with that approach is that libc can after an upgrade decide to start doing syscalls you were not expecting.
That would break capsicum, too, so I don’t see how that’s a problem when “comparing Capsicum to using seccomp in the same way”.
That's the approach I meant by "that approach", the library the parent commenter was talking about writing for a customer. Compare this to Landlock or OpenBSDs pledge/unveil.
Now that Landlock actually is a thing, have you considered writing another followup? Given what I've seen of landlock, I expect it'll be spicy...
I took the bait.
“The goal of Landlock is to enable restriction of ambient rights (e.g. global filesystem or network access) for a set of processes. Because Landlock is a stackable LSM [(Linux Security Model)], it makes it possible to create safe security sandboxes as new security layers in addition to the existing system-wide access-controls. ... Landlock empowers any process, including unprivileged ones, to securely restrict themselves.”
https://docs.kernel.org/userspace-api/landlock.html
I've actually found it pretty fine. It doesn't have full coverage, but they have a system of adding coverage (ABI versions), and it covers a lot of the important stuff.
The one restriction I'm not sure about is that you can't say "~/ except ~/.gnupg". You have to actually enumerate everything you do want to allow. But maybe that's for the best. Both because it mandates rules not becoming too complex to reason about, and because that's a weird requirement in general. Like did you really mean to give access to ~/.gnupg.backup/? Probably not. Probably best to enumerate the allowlist.
And if you really want to, I guess you can listdir() and compose the exhaustive list manually, after subtracting the "except X".
I find seccomp unusable and not fit for purpose, but landlock closes many doors.
Maybe you know better? I'd love to hear your take.
I definitely don't know better, and after taking a few more looks at landlock, I'm not even sure what my objections were, probably got it confused with something else entirely. Confusion and ignorance on my part I guess.