I have some test code that runs a comparison of Hyper pre-async (aka thread per request) vs async (via Tokio), and the pre-async version is able to process more requests per second in every scenario (I/o, CPU complex tasks, shared memory).

I'll publish my results shortly. I did these as baselines because I'm testing finishing the User Managed Concurrency Groups proposal to the linux kernel which is an extension to provide faster kernel threads (which beat both of them)

Relevant prior work: https://github.com/jimblandy/context-switch

Thank you for this! This is really helpful.

The UMCG implementation allows kernel thread context switches to happen in 150-200 microseconds, compared to the 1500-2000 microseconds for normal kernel thread context switches. My goal is to show that if UMCG could be merged into the Linux run time then then it would be competitive with async rust without the headache.

How many concurrent requests?

I'll have to check my work computer on Monday. It was 8 cpu virtual machine on a m1 Mac. the UMCG and normal threads were 1024 set on the server, the Tokio version was 2 threads per core. Just from the top of my head - the I/O bound requests topped out around 40k/second for the Tokio version, 60k/second for the normal hyper version, and 80k/second for the UMCG hyper version.

I'm pretty close to being done - I'm hoping to publish the entire GitHub repository with tests for the community to validate by next week.

UMCG is essentially an open source version of Google Fibers, which is their internal extension to the linux core for "light weight" threads. It requires you to build a user space scheduler, but that allows you to create different types of schedulers. I can not remember which scheduler showed ^ results but I have at least 6 different UMCG schedulers I was testing.

So essentially you get the benefits of something like tokio where you can have different types of schedulers optimized for different use cases, but the power of kernel threads which means easy cancellation, easy programming (at least in rust). It's still a linux thread with an entire 8mb(?) stack size, but from my testing it's far faster than what Tokio can provide, without the headache of async/await programming.

Async only exists because languages like Python and Javascript have global interpreter locks that don't play nice with threads.

Using async for languages like Rust or C++ is cargo cult by people who don't know what the hell they're doing.

[Caveat: there's a use case for async if you're doing embedded development where you don't have threads or call stacks at all.]