Threads as they are conventionally considered are inadequate. Operating systems should offer something along the lines of scheduler activations[0]: a low-level mechanism that represents individual cores being scheduled/allocated to programs. Async is responsive simply because it conforms to the (asynchronous) nature of hardware events. Similarly, threads are most performant if leveraged according to the usage of hardware cores. A program that spawns 100 threads on a system with 10 physical cores is just going to have threads interrupting each other for no reason; each core can only do so much work in a time frame, whether it's running 1 thread or 10. The most performant/efficient abstraction is a state machine[1] per core. However, for some loss of performance and (arguable) ease of development, threads can be used on top of scheduler activations[2]. Async on top of threads is just the worst of both worlds. Think in terms of the hardware resources and events (memory accesses too), and the abstractions write themselves.

[0] https://en.wikipedia.org/wiki/Scheduler_activations, https://dl.acm.org/doi/10.1145/121132.121151 | Akin to thread-per-core

[1] Stackless coroutines and event-driven programming

[2] User-level virtual/green threads today, plus responsiveness to blocking I/O events

Haven't scheduler activations largely been abandoned in the bad and linux kernels?

Yes; my understanding is that, for kernels designed for 1:1 threading, scheduler activations are an invasive change and not preferred by developers. Presumably, an operating system designed around scheduler activations would be able to better integrate them into applications, possibly even binary-compatibly with existing applications expecting 1:1 threading.