It is still a bit amazing to me that it was significantly easier to do coroutines in Sigma 5 assembly and likely most any assembly than in C or C++. Two languages supposedly close to the machine.

I have seen a pure C/C++ implementation of coroutines (it used setjmp/longjmp, and memcpy to copy stacks in and out of the native arena). Not the most portable of constructions, but it worked absurdly well.

Being able to write "async" code essentially in-line is a superpower.

Agree but nowhere as easy as what we did https://ciex-software.com/coroutines.html