consider using fp16 or bf16 for the matrix math (in SME you can use svmopa_za16_f16_m or svmopa_za16_bf16_m)