I made the first proposal to the C++ standard committee to introduce SIMD in 2011, before Matthias Kretz got involved with his own version (which is what became std::simd). This was based on what eventually became Eve (mentioned in the article).
Back then, it was rejected, for the same arguments that people are making today, such as not mapping to SVE well, having a separate way to express control flow etc.
There was a real alternative being considered at the time: integrating ISPC-like semantics natively in the language. Then that died out (I'm not sure why), and SIMD became trendy, so the committee was more open to doing something to show that they were keeping up with the times.
> There was a real alternative being considered at the time: integrating ISPC-like semantics natively in the language.
I think this is the best solution for truely portable SIMD. Sure it doesn't cover everything, but it makes autovec explicit, guaranteed and more powerfull.
One of the biggest problems with "portable" SIMD libraries, is that when it's used for simple things, often autovec is better, as it has access to the direct ISA semantics and can much easier do things like unrolling.
To me it’s clear adding the ability to express intent to parallelise is the Right Thing. This is the only way the compiler can actually know what you want it to do.
Trying to abstract over SVE with a SIMD library is a bit of a fool's errand. The intended programming model is just too different from traditional ISAs, and there are algorithms that are nearly impossible to write efficiently for it. All the ones I've seen wrap it up as a bastardized fixed length ISA, and even ARM's own guidance basically recommends that approach.
Frankly, the length agnostic stuff is a mistake that I hope hardware designers will eventually see the light on, like delay slots.
> Trying to abstract over SVE with a SIMD library is a bit of a fool's errand
It reallt isn't. You just make the default SIMD-width agnostic and anything less portable opt-in.
You can still specialize for a specific width pn scalabe vector ISAs.
> The intended programming model is just too different from traditional ISAs, and there are algorithms that are nearly impossible to write efficiently for it.
Such as?
> All the ones I've seen wrap it up as a bastardized fixed length ISA, and even ARM's own guidance basically recommends that approach.
google highway doesn't. And while Arm is stuck with 128-bit SVE, because they alsp have to implement NEON as fast as possible to be competitive, RVV already has a large diversitly of hardware with different vector length available 128,256,512,1024.
I'm no C++ dev, but as an outsider, it sure reads like the whole "int is variable length" mistake again.
That abstraction is occasionally usable in low level systems code, that is why Go, Rust, D and C# support it as well.
Also to note that is C not C++.