If there is so much performance difference among generic allocators, it means you need semantic optimized allocators (unless performance is actually not that much important in the end).

You are not wrong and this is indeed what zig is trying to push by making all std functions that allocate take a allocator parameter.

Agreed mostly. Going from standard library to something like jemalloc or tcmalloc will give you around 5-10% wins which can be significant, but the difference between those generic allocators seem small. I just made a slab allocator recently for a custom data type and got speedups of 100% over malloc.