Compared to Rayon or Taskflow, the biggest initial win is cutting out heap allocations for all the promise/result objects — those act like mutexes once the allocator gets hammered by many threads.

Hard to rank the rest without a proper breakdown. If I ever tried, I’d probably end up writing a paper — and I’d rather write code :)