Threads are a scheduling model that delegates to the OS scheduler. Async style provides a primitive for creating a custom scheduler but is not a scheduler per se.
To use a custom scheduler you must first disable the existing schedulers your code is using by default for both execution and I/O. That means no OS scheduling. Thread-per-core architectures with static allocation and direct userspace I/O is the idiomatic way to do this regardless of programming language.
Optimal scheduling is a profoundly intractable problem -- it is AI-Complete. A generic scheduler is always going to be deeply suboptimal because a remotely decent schedule isn't practically computable in real systems. A more optimal scheduler must continuously rewrite the selection and ordering of thousands of concurrent operations in real-time. Importantly, this dynamic schedule rewriting is based on a model that can see across all operations globally and accurately predict both future operations that haven't happened yet and any ordering dependencies between current and future operations. A modern system can handle tens of millions of these operations per second, so the scheduling needs to be efficient.
A generic scheduler has to allow for almost arbitrary operation graphs and behavior. However, if you are writing e.g. a database engine, you have almost the entire context of how operations relate to each other both concurrently and across time. The design of a somewhat optimal scheduler that only understands your code becomes computationally feasible. It isn't trivial -- scheduler design is properly difficult -- but you build it using async style.
That’s not what I asked.
I'm going to hop in and say this would be a good exercise for you, instead. The industry has, in general, decided upon stackless threads and other async systems.
What does "I/O optimized scheduling" look like to you, and does it end up with the same sort of compiler hints, like "async / await"? Or is it different?