That’s true — though in my benchmarks Tokio came out as one of the slower parallelism-enabling projects. The article still included a comparison:

  $ PARALLEL_REDUCTIONS_LENGTH=1536 cargo +nightly bench -- --output-format bencher
  
  test fork_union ... bench:  5,150 ns/iter (+/- 402)
  test rayon ... bench:      47,251 ns/iter (+/- 3,985)
  test smol ... bench:       54,931 ns/iter (+/- 10)
  test tokio ... bench:     240,707 ns/iter (+/- 921)
... but I now avoid comparing to Tokio since it doesn’t seem fair — fork-join style parallel processing isn’t really its primary use case.

That's outrageous.. and I don't agree with your assessment, because smol is in the same niche as Tokio (that is, an async execuutor, which isn't necessarily optimizing for CPU-bound workloads) and isn't nearly as slow.

I think performance is a very critical property for Rust infrastructure. One can only hope that newer Tokio versions could address overheads which make everyone slower than necessary.