Sure it can all be solved, FUSE is an example of doing that for less important ancillary filesystems. I'd actually just make the protocol stateless and store fd state in the clients. My point is more general - the people who design operating systems know all about these tradeoffs and have to decide what to spend time on within a limited budget.

Consider: crash bugs are finite. Do you spend your time on complex rearchitecting of your OS to try and fail slightly less hard when some critical code crashes, or do you spend that time fixing the bugs? If the code is big, fast changing and third party then it might make sense to put in the effort, hence FUSE and why graphics drivers often run a big chunk of code out of kernel. If the code is small, stable and performance sensitive, like a core filesystem where all your executables reside, then it doesn't make sense and stays in.

Browsers also use a micro-kernelish concept these days. But they're very deliberate and measured about what gets split out into extra processes and what doesn't.

The microkernel concept advocates for ignoring engineering tradeoffs in order to put everything into userspace all the time, and says precious little about how to ensure that translates into actual rewards. That's why it's an academic concept that's hardly used today.

>crash bugs are finite. Do you spend your time on complex rearchitecting of your OS to try and fail slightly less hard when some critical code crashes, or do you spend that time fixing the bugs?

Finite can still be a very large number. Clearly the former is preferable, otherwise your argument applies just as well to usermode code. Why bother having memory protection when the code should be correct anyway?

Remember the CloudStrike bug? That wouldn't have happened had the developer been able to put the driver in user mode. The module was not critical, so the system could have kept on running and a normal service could have reported that the driver had failed to start due to an error. That's much, much, much preferable to a boot loop.

Everyone is responsible for their own software, but the OS is more critical than other pieces and also a lot more profitable, so they can afford to invest. Some userspace apps with large budgets do use microkernel architectures, most obviously browsers.

But by and large, kernel code is much more tightly scoped and stable than userspace apps. The requirements for a core filesystem change very slowly and a migration from one version to another can take years. Userspace apps might update every week and still be too slow. We tolerate much more instability in the latter than the former.

...What? How is that a response to anything I said?

Let me try again.

The engineering costs of moving things out of the kernel can be significant. If your OS isn't totally hosed then - third party drivers excepted - there's probably a finite number of bugs you have to solve to get reliability up above your target level. It can often make sense to just sit down and fix the bugs instead of moving code out of kernel space, which will take a long time and at the end the bugs will still be there and still need to be fixed.

This argument gets a lot weaker when you can't fix the bugs, or when code changes so frequently new bugs get added at the same rate they get fixed. AV scanners and GPU drivers are good examples of that. And they do tend to get moved out of kernel space. Most of CrowdStrike doesn't run in kernel mode, and arguably Microsoft should have kicked the remaining parts out of the kernel a long time ago. A big chunk of the GPU driver was already moved.

Unfortunately by the nature of what AV scanners are trying to do they try to get everywhere. I'm sure MS would love nothing more than to boot them out of Windows but that's an antitrust issue not a technical issue.