Slightly related and coming from ignorance here, but what is the general intuition for the pros and cons of a microkernel approach in OS development?

Every modern commercial OS is a hybrid architecture these days. Generally subsystems move out of the kernel when performance testing shows the cost isn't too high and there's time/money to do so. Very little moves back in, but it does happen sometimes (e.g. kernel TLS acceleration).

There's not much to say about it because there's never been an actual disagreement in philosophy. Every OS designer knows it's better for stability and development velocity to have code run in userspace and they always did. The word microkernel came from academia, a place where you can get papers published by finding an idea, giving it a name and then taking it to an extreme. So most microkernels trace their lineage back to Mach or similar, but the core ideas of using "servers" linked by some decent RPC system can be found in most every OS. It's only a question of how far you push the concept.

As hardware got faster, one of the ways OS designers used it was to move code out of the kernel. In the 90s Microsoft obtained competitive advantage by having the GUI system run in the kernel, eventually they moved it out into a userland server. Apple nowadays has a lot of filing systems run in userspace but not the core APFS that's used for most stuff, which is still in-kernel. Android moved a lot of stuff out of the kernel with time too. It has to be taken on a case by case basis.

Can you explain why TTY-PTY functionality hasn't been moved from the Linux kernel to userspace? Plan 9 did so in the 1990s or earlier (i.e., when Plan 9 was created, they initially put the functionality in userspace and left it there.)

I don't understand that, and I also don't understand why users who enjoy text-only interaction with computers are still relying on very old designs incorporating things like "line discipline", ANSI control sequences and TERMINFO databases. A large chunk of cruft was introduced for performance reasons in the 1970s and even the 1960s, but the performance demands of writing a grid of text to a screen are very easily handled by modern hardware, and I don't understand why the cruft hasn't been replaced with something simpler.

In other words, why do users who enjoy text-only interaction with computers still emulate hardware (namely, dedicated terminals) designed in the 1960s and 1970s that mostly just displays a rectangular grid of monospaced text and consequently would be easy to implement afresh using modern techniques?

There a bunch of complexity in every terminal emulator for example for doing cursor-addressing. Network speeds are fast enough these days (and RAM is cheap enough) that cursor-addressing is unnecessary: every update can just re-send the entire grid of text to be shown to the user.

Also, I think the protocol used in communication between the terminal and the computer is stateful for no reason that remains valid nowadays.

The usual reason for all of this is that programmer time is expensive (even if you're a volunteer, you have limited hours available), and not many people want to volunteer to wade through tons of legacy tech debt. That's especially true when the outcome will be an OS that behaves identically to before. A lot of stuff stays in the kernel because it's just hard to move it out.

Bear in mind, moving stuff out of the kernel is only really worth it if you can come up with a reasonable specification for how to solve a bunch of new problems. If you don't solve them it's easy to screw up and end up with a slower system yet no benefit.

Consider what happens if you are overenthusiastic and try to move your core filesystem into userspace. What does the OS do if your filesystem process segfaults? Probably it can't do anything at that point beyond block everything and try to restart it? But every process then lost its connection to the FS server and so all the file handles are suddenly invalidated, meaning every process crashes. You might as well just panic and reboot, so, it might as well stay in the kernel. And what about security? GNU Hurd jumped on the microkernel bandwagon but ended up opening up security vulnerabilities "by design" because they didn't think it through deeply enough (in fairness, these issues are subtle). Having stuff be in the kernel simplifies your architecture tremendously and can avoid bugs as well as create them. People like to claim microkernels are inherently more secure but it's not the case unless you are very careful. So it's good to start monolithic and spin stuff out only when you're ready for the complexity that comes with that.

Linux also has the unusual issue that the kernel and userspace are developed independently, which is an obvious problem if you want to move functionality between the two. Windows and macOS can make assumptions about userspace that Linux doesn't.

If you want to improve terminals then the wrong place to start is fiddling with moving code between kernel and user space. The right place to start is with a brand new protocol that encodes what you like about text-only interaction and then try to get apps to adopt it or bridge old apps with libc shims etc.

>Consider what happens if you are overenthusiastic and try to move your core filesystem into userspace. What does the OS do if your filesystem process segfaults? Probably it can't do anything at that point beyond block everything and try to restart it? But every process then lost its connection to the FS server and so all the file handles are suddenly invalidated, meaning every process crashes. You might as well just panic and reboot, so, it might as well stay in the kernel.

I mean, it's not necessarily true that if a filesystem process crashes, every other process crashes. Depending on the design, each FS process may serve requests for each mountpoint, or for each FS type. That already is a huge boon to stability, especially if you're using experimental FSs. On top of that, I think the broken connection could be salvageable by the server storing handle metadata in the kernel and retrieving it when the kernel revives the process. It's hardly an insurmountable problem.

Sure it can all be solved, FUSE is an example of doing that for less important ancillary filesystems. I'd actually just make the protocol stateless and store fd state in the clients. My point is more general - the people who design operating systems know all about these tradeoffs and have to decide what to spend time on within a limited budget.

Consider: crash bugs are finite. Do you spend your time on complex rearchitecting of your OS to try and fail slightly less hard when some critical code crashes, or do you spend that time fixing the bugs? If the code is big, fast changing and third party then it might make sense to put in the effort, hence FUSE and why graphics drivers often run a big chunk of code out of kernel. If the code is small, stable and performance sensitive, like a core filesystem where all your executables reside, then it doesn't make sense and stays in.

Browsers also use a micro-kernelish concept these days. But they're very deliberate and measured about what gets split out into extra processes and what doesn't.

The microkernel concept advocates for ignoring engineering tradeoffs in order to put everything into userspace all the time, and says precious little about how to ensure that translates into actual rewards. That's why it's an academic concept that's hardly used today.

>crash bugs are finite. Do you spend your time on complex rearchitecting of your OS to try and fail slightly less hard when some critical code crashes, or do you spend that time fixing the bugs?

Finite can still be a very large number. Clearly the former is preferable, otherwise your argument applies just as well to usermode code. Why bother having memory protection when the code should be correct anyway?

Remember the CloudStrike bug? That wouldn't have happened had the developer been able to put the driver in user mode. The module was not critical, so the system could have kept on running and a normal service could have reported that the driver had failed to start due to an error. That's much, much, much preferable to a boot loop.

Everyone is responsible for their own software, but the OS is more critical than other pieces and also a lot more profitable, so they can afford to invest. Some userspace apps with large budgets do use microkernel architectures, most obviously browsers.

But by and large, kernel code is much more tightly scoped and stable than userspace apps. The requirements for a core filesystem change very slowly and a migration from one version to another can take years. Userspace apps might update every week and still be too slow. We tolerate much more instability in the latter than the former.

...What? How is that a response to anything I said?

Let me try again.

The engineering costs of moving things out of the kernel can be significant. If your OS isn't totally hosed then - third party drivers excepted - there's probably a finite number of bugs you have to solve to get reliability up above your target level. It can often make sense to just sit down and fix the bugs instead of moving code out of kernel space, which will take a long time and at the end the bugs will still be there and still need to be fixed.

This argument gets a lot weaker when you can't fix the bugs, or when code changes so frequently new bugs get added at the same rate they get fixed. AV scanners and GPU drivers are good examples of that. And they do tend to get moved out of kernel space. Most of CrowdStrike doesn't run in kernel mode, and arguably Microsoft should have kicked the remaining parts out of the kernel a long time ago. A big chunk of the GPU driver was already moved.

Unfortunately by the nature of what AV scanners are trying to do they try to get everywhere. I'm sure MS would love nothing more than to boot them out of Windows but that's an antitrust issue not a technical issue.

I think the fact that the line protocol for DEC VT terminals is as the ANSI X3.64 standard is why the issue hasn’t been addressed or modernized

See https://en.m.wikipedia.org/wiki/ANSI_escape_code

Some simpler CPU boards for embedded systems have no onboard graphics, they just have a serial port, so you have to use a terminal or terminal emulator to talk to them.

> Every modern commercial OS

Every +*general-puprose OS.

Nintendo's 3DS OS and Switch 1+2 OS are bespoke and strictly microkernel-based (with the exception of DMA-330 CoreLink DMA handling on 3DS if you want to count is as such), and these have been deployed on hundreds of millions of commercially-sold devices.

Microkernels are conceptually cleaner, and easier to make secure, but in practice generally slower than unikernels.

Gernot Heiser would strongly disagree with you on the last one :D