> The Rust folks adopted async with callbacks

Rust's async is not based on callbacks, it's based on polling. So really there are three ways to implement async:

- The callback approach used by e.g. Node.js and Swift, where a function that may suspend accepts a callback as an argument, and invokes the callback once it is ready to make progress. The compiler transforms async/await code into continuation-passing style.

- The stackful approach used by e.g. Go, libtask, and this; where a runtime switches between green threads when a task is ready to make progress. Simple and easy to implement, but introduces complexity around stack size.

- Rust's polling approach: an async task is statically transformed into a state machine object that is polled by a runtime when it's ready to make progress.

Each approach has its advantages and disadvantages. Continuation-passing style doesn't require a runtime to manage tasks, but each call site must capture local variables into a closure, which tends to require a lot of heap allocation and copying (you could also use Rust's generic closures, but that would massively bloat code size and compile times because every suspending function must be specialized for each call site). So it's not really acceptable for applications looking for maximum performance and control over allocations.

Stackful coroutines require managing stacks. Allocating large stacks is very expensive in terms of performance and memory usage; it won't scale to thousands or millions of tasks and largely negates the benefits of green threading. Allocating small stacks means you need the ability to dynamically resize stacks at runtime, which requires dynamic allocation and adds significant performance and complexity overhead if you want to make an FFI call from an asynchronous task (in Go, every function begins with a prologue to check if there is enough stack space and allocate more if needed; since foreign functions do not have this prologue, an FFI call requires switching to a sufficiently large stack). This project uses fixed-sized task stacks, customizable per-task but defaulting to 256K [1]. This default is several orders of mangitude larger than a typical task size in other green-threading runtimes, so to achieve large scale the programmer must manually manage the stack size on a per-task basis, and face stack overflows if they guess wrong (potentially only in rare/edge cases).

Rust's "stackless" polling-based approach means the compiler knows statically exactly how much persistent storage a suspended task needs, so the application or runtime can allocate this storage up-front and never need to resize it; while a running task has a full OS thread stack available as scratch space and for FFI. It doesn't require dynamic memory allocation, but it imposes limits on things like recursion. Rust initially had stackful coroutines, but this was dropped in order to not require dynamic allocation and remove the FFI overhead.

The async support in Zig's standard library, once it's complete, is supposed to let the application developer choose between stackful and stackless coroutines depending on the needs of the application.

[1]: https://github.com/lalinsky/zio/blob/9e2153eed99a772225de9b2...