That’s true — though in my benchmarks Tokio came out as one of the slower parallelism-enabling projects. The article still included a comparison:
$ PARALLEL_REDUCTIONS_LENGTH=1536 cargo +nightly bench -- --output-format bencher
test fork_union ... bench: 5,150 ns/iter (+/- 402)
test rayon ... bench: 47,251 ns/iter (+/- 3,985)
test smol ... bench: 54,931 ns/iter (+/- 10)
test tokio ... bench: 240,707 ns/iter (+/- 921)
... but I now avoid comparing to Tokio since it doesn’t seem fair — fork-join style parallel processing isn’t really its primary use case.
That's outrageous.. and I don't agree with your assessment, because smol is in the same niche as Tokio (that is, an async execuutor, which isn't necessarily optimizing for CPU-bound workloads) and isn't nearly as slow.
I think performance is a very critical property for Rust infrastructure. One can only hope that newer Tokio versions could address overheads which make everyone slower than necessary.