Portable multi-versioning is kind of hard to set up. E.g. compilers on Linux are not happy to emit AVX512 intrinsics when the architecture isn't enabled via -m... - this is also true for the case where you're trying to setup a dispatching system relying on cpuid, etc.
Is this specific to AVX512? It works well for e.g. AVX2.
Yes, at least on AVX512 the compiler will throw a fit on trying to use intrinsics in case you haven't enabled TU-global architecture with options.
Seems to work fine for me: https://gcc.godbolt.org/z/hPexshjoa
Likely a different compiler/version. GCC had this error for me recently:
error: inlining failed in call to 'always_inline' 'float _mm512_reduce_add_ps(__m512)': target specific option mismatch
Compiler Explorer link or it didn't happen? :-)