The highway library is exactly the kind of a simpler option to use SIMD. Less efficient than hand written assembler but you can easily write good enough SIMD for multiple different architectures.