Very true, but a lot of stuff builds on a few core optimized libraries like BLAS/LAPACK, and picking up a build of those targeted at a modern microarchitecture can give you 10x or more compared to a non-targeted build.

That said, most of those packages will just read the hardware capability from the OS and dispatch an appropriate codepath anyway. You maybe save some code footprint by restricting the number of codepaths it needs to compile.