The root cause of the issue, is that musl malloc uses a single head, and relies on locking to support multiple heaps. This means each allocation/free must acquire this lock. Imo it's good for single threaded programs (which might've been musls main usecase), but Rust programs nowadays mostly use multiple threads.
In contrast mimalloc, a similarly minimalistic allocator has a per-thread heap, which each thread owning the memory it allocates, and cross-thread free's are handled in a deferred manner.
This works very well with Rust's ownership system, where objects rarely move between threads.
Internally, both allocators use size-class based allocation, into predefined chunks, with the key difference being that musl uses bitmaps and mimalloc uses free lists to keep track of memory.
Musl could be fixed, it they switch from a single thread model, to a per-thread heap as well.
> a similarly minimalistic allocator
mimalloc has about 10kloc, while (assuming I'm looking in the right place) the new musl allocator has 891 and the old musl allocator has 518 lines of code. I wouldn't call an order of magnitude difference in line count 'similar'.
It's minimalistic in the sense that it compiles to a tiny binary (a lot of the code is either per platform, musl is POSIX only afaik) or for debugging. Yes it's bigger, but still tiny compared to something like jemalloc, and I'm sure it's like 10kb in a binary.
yeah, the Mimalloc design is just the correct one.