> Trying to abstract over SVE with a SIMD library is a bit of a fool's errand

It reallt isn't. You just make the default SIMD-width agnostic and anything less portable opt-in.

You can still specialize for a specific width pn scalabe vector ISAs.

> The intended programming model is just too different from traditional ISAs, and there are algorithms that are nearly impossible to write efficiently for it.

Such as?

> All the ones I've seen wrap it up as a bastardized fixed length ISA, and even ARM's own guidance basically recommends that approach.

google highway doesn't. And while Arm is stuck with 128-bit SVE, because they alsp have to implement NEON as fast as possible to be competitive, RVV already has a large diversitly of hardware with different vector length available 128,256,512,1024.

    Such as?
I have a database that has big columns that get functions applied to them to compute the result set. This is a perfect case for length agnostic instructions, except out ends up horribly memory bound. A nice optimization is to only compute those lanes containing rows that might actually be in the result set by keeping track of a sparse record that depends on the lane size. But the cnt instructions are optional, and this also inhibits compiler optimizations in that lookup.

CNT and CNTP don't seem to be optional for SVE, from what I found. (unless you mean HISTCNT)

It seems to me like you want tp use CNTP on a bitset that tells you, which rows are relevant, skipping them if CNT is 0? Is that what you where describing?