On the other hand, early Rust also for instance had a tracing garbage collector; it's far from obvious to me how relevant its discarded design decisions are supposed to be to the language it is today.
On the other hand, early Rust also for instance had a tracing garbage collector; it's far from obvious to me how relevant its discarded design decisions are supposed to be to the language it is today.
This one is relevant because it avoids heap allocation while running the iterator and for loop body concurrently. Which is exactly the kind of thing that `async` does.
It avoids heap allocation in some situations. But in principle the exact same optimization could be done for stackful coroutines. Heck, right now in C I could stack-allocate an array and pass it to pthread_create as the stack for a new thread. To avoid an overlarge allocation I would need to know exactly how much stack is needed, but this is exactly the knowledge the Rust compiler already requires for async/await.
What people care about are semantics. async/await leaks implementation details. One of the reasons Rust does it the way it currently does is because the implementation avoids requiring support from, e.g., LLVM, which might require some feature work to support a deeper level of integration of async without losing what benefits the current implementation provides. Rust has a few warts like this where semantics are stilted in order to confine the implementation work to the high-level Rust compiler.
> in principle the exact same optimization could be done for stackful coroutines.
Yes, I totally agree, and this is sort of what I imagine a better design would look like.
> One of the reasons Rust does it the way it currently does is because the implementation avoids requiring support from, e.g., LLVM
This I would argue is simply a failure of imagination. All you need from the LLVM layer is tail calls, and then you can manage the stack layout yourself in essentially the same way Rust manages Future layout.
You don't even need arbitrary tail calls. The compiler can limit itself to the sorts of things LLVM asks for- specific calling convention, matching function signatures, etc. when transferring control between tasks, because it can store most of the state in the stack that it laid out itself.
In order to know for sure how much stack is needed (or to replace the stack with a static allocation, which used to be common on older machines and still today in deep embedded code, and even on GPU!), you must ensure that any functions you call within your thread are non-reentrant, or else that they resort to an auxiliary stack-like allocation if reentrancy is required. This is a fundamental constraint (not something limited to current LLVM) which in practice leads you right back into the "what color are your functions?" world.