The first implementation I encountered was a linear search, starting at the last-found field. Empirically it performed better to do a binary search with early exit and branchless bounds selection, I think due to branch predictor pressure. The data representation could be changed but it's tricky, as there are other traversals that want to go in sorted order, and there are lots of places that pass just one pointer for fields. But I agree any further improvement will probably have to come from that.
SIMD is tricky even with SoA because there is significant latency going between the general registers and the vector units, plus arm little cores can be configured to share a vector unit with another core.
> SIMD is tricky even with SoA because there is significant latency going between the general registers and the vector units
My experience is mostly limited to AMD64, but libraries like glibc use SIMD in many places for faster linear search. Presumably they've done testing and found it worth while.