> I'm not normally keen to "well actually" people with the C standard, but .. if you're writing in assembly, you're not writing in C.
These days on Linux/BSD/Solaris/macOS you can use makecontext()/swapcontext() from ucontext.h and it will turn out roughly the same performance on important architectures as what everyone used to do with custom assembly. And you already have fiber functions as part of the Windows API to trampoline.
I had to support a number of architectures in libdex for Debian. This is GNOME code of course, which isn't everyone's cup of C. (It also supports BSDs/Linux/macOS/Solaris/Windows).
Unfortunately swap context requires saving and restoring the signal mask, which, at least on Linux, requires a syscall so it is going to be at least a hundred times slower than an hand rolled implementation.
Also, although not likely to be removed anytime soon from existing systems, POSIX has declared the context API obsolescent a while ago (it might actually no longer be part of the standard).
Stackful coroutines also can't be used to "send" a coroutine to a worker thread, because the compiler might save the address of a thread local variable across the thread switch (happened in QEMU).
Yes I know, GCC has a long standing bug open on the issue :(.
Signal mask? What century are we in?
It can be safely ignored for the vast majority of apps. If you're using multithreading (quite likely if you're doing coroutines), then signals are not a good fit anyway.
Aside from the fact that the signal mask is still relevant in 2026 and even for multithreaded programs, that doesn't have anything to do with the fact that POSIX requires swapcontext to preserve it.
In most cases you're already using signalfd in places where libdex runs.