Seems like you can use the core intuition here of SIMD comparison of multiple elements on more than just the terminal scale.
The outline would be:
a) use a gather to grab multiple elements from 16 evenly spaced locations
b) compare these in parallel using a SIMD instruction
c) focus in on the correct block
d) if the block is small revert to linear search, else repeat the gather/compare cycle
Even though the gather instruction is reading from non-contiguous memory and reading more than you normally would need to with a binary sort, enabling a multiway compare and collapsing 4 levels of binary search should be a win on large tables.
You also may not be able to do a full 16-way comparison for all data types. Searching for float64 will limit you to 8 way comparisons (with AVX-512) but int32, float32 will allow 16 way comparisons.
Gather is extremely slow. Anyone aiming for efficiency will avoid gathers.
I bet you a binary search is in fact faster than any gather based methodology.