Even when the layout is friendly to simd, auto vectorization can be finicky. As a programmer, it’s really annoying to be constantly inspecting compiler output to see if the code was properly vectorized. Even if it was, slight changes or compiler updates can throw the whole thing off. Auto vectorization is nice when you get performance improvements for “free”, but I find it fragile for the really critical parts where you absolutely need it to be vectorized.

I often wonder about a macro-like thing where we could write a function using a subset of the language that’s simd aware. A bit higher level than using intrinsics or those simd libs