The OS allocates your thread stack in a very similar way that a coroutine runtime allocates the coroutine stack. The OS will swap the stack pointer and a bunch more things in each context switch, the coroutine runtime will also swap the stack pointer and some other things. It's really the same thing. The only difference is that the runtime in a compiled language knows more about your code than the OS does, so it can make assumptions that the OS can't and that's what makes user-space coroutines lighter. The mechanisms are the same.

And the stackless runtime will use some other register than the stack pointer to access the coroutine's activation frame, leaving the stack pointer register free for OS and library use, and avoiding the many drawbacks of fiddling with the system stack as stackful coroutines do. It's the same thing.