it's very no uniform. 99% see no change, but 1% see 1.5-2x better performance

I'm wondering if 'somewhat numerical in nature' relates to lpack/blas and similar libraries that are actually dependencies of a wide range of desktop applications?

blas and lapack generally do manual multi-versioning by detecting CPU features at runtime. This is more useful 1 level up the stack in things like compression/decompression, ode solvers, image manipulation and so on that are still working with big arrays of data, but don't have a small number of kernels (or as much dev time), so they typically rely on compilers for auto-vectorization

I read it as, across the board a 1% performance improvement. Not that only 1% of packages get a significant improvement.

In a complicated system, a 1% overall benefit might well be because of a 10% improvement in just 10% of the system (or more in a smaller contributor).

The announcement is pretty clear on this:

   > Previous benchmarks (...) show that most packages show a slight (around 1%) performance improvement and some packages, mostly those that are somewhat numerical in nature, improve more than that.