bzip2 is particularly slow because the transform it depends on (BWT2) is "intrinsically slow" - it depends on cache-unfriendly operations with long dependency chains, preventing the CPU from extracting any parallelism:
https://cbloomrants.blogspot.com/2021/03/faster-inverse-bwt....