For a pretty small N I've found that less clever can be quite a bit faster. I'd try a linear search - possibly SIMD if you can change the data format to struct-of-arrays. An adaptive approach that uses linear search up to a certain N can also yield some benefit.

The first implementation I encountered was a linear search, starting at the last-found field. Empirically it performed better to do a binary search with early exit and branchless bounds selection, I think due to branch predictor pressure. The data representation could be changed but it's tricky, as there are other traversals that want to go in sorted order, and there are lots of places that pass just one pointer for fields. But I agree any further improvement will probably have to come from that.

SIMD is tricky even with SoA because there is significant latency going between the general registers and the vector units, plus arm little cores can be configured to share a vector unit with another core.

If you control the layout, eytzinger layout typically will give you the best of both worlds. As fast as a linear scan for small N, much faster than binary search over a sorted array for large N.