> a native stack [is] just a form of memory allocator

There is a lot riding on that “just”. Hardware stacks are very, very unlike heap memory allocators in pretty much every possible way other than “both systems provide access to memory.”

Tons and tons of embedded code assumes the stack is, indeed, a hardware stack. It’s far from trivial to make that code “just use a dummy/static allocator with the same api as a heap”; that code may not be in Rust, and it’s ubiquitous for embedded code to not be written with abstractions in front of its allocator—why would it do otherwise, given that tons of embedded code was written for a specific compiler+hardware combination with a specific (and often automatic or compiler-assisted) stack memory management scheme? That’s a bit like complaining that a specific device driver doesn’t use a device-agnostic abstraction.

During the design phase of Rust async there were no async embedded code written to be inspired. For systems with tight memory budget it is common to pre-allocate everything often using custom bump allocation or split memory into few regions for fixed-sized things and allocate from those.

And then the need to poll features by the runtime means that async in Rust requires non-trivial runtime going against the desire to avoid abstractions in the embedded.

Async without polling while stack-unfriendly requires less runtime. And if Rust supported type-safe region-based allocators when a bunch of things are allocated one by one and then released at once it could be a better fit for the embedded world.