> If you allocate too much stack in advance, you end up being not much cheaper than OS threads;

Maybe. a smart event loop could track how many frames are in flight at any given time and reuse preallocated frames when their frames dispatch out.