> Trying to abstract over SVE with a SIMD library is a bit of a fool's errand
It reallt isn't. You just make the default SIMD-width agnostic and anything less portable opt-in.
You can still specialize for a specific width pn scalabe vector ISAs.
> The intended programming model is just too different from traditional ISAs, and there are algorithms that are nearly impossible to write efficiently for it.
Such as?
> All the ones I've seen wrap it up as a bastardized fixed length ISA, and even ARM's own guidance basically recommends that approach.
google highway doesn't. And while Arm is stuck with 128-bit SVE, because they alsp have to implement NEON as fast as possible to be competitive, RVV already has a large diversitly of hardware with different vector length available 128,256,512,1024.
CNT and CNTP don't seem to be optional for SVE, from what I found. (unless you mean HISTCNT)
It seems to me like you want tp use CNTP on a bitset that tells you, which rows are relevant, skipping them if CNT is 0? Is that what you where describing?