Since binary search is already very fast with its O(log n) time complexity: are there any real world applications which could practically benefit from this improvement?

I guess it matters if you have to do lookup in a tight loop. If you do this occasionally, I think it's not worth it, especially for complex objects with custom comparators. The algorithm is still O(log(n)) just a more advanced "divide and conquer" with smaller constant.

I would expect the standard library of various languages to provide an optimised implementation such as this. Then everyone downstream benefits, and benefits from future improvements when compiled for a newer version of the language / executed under a newer version of the runtime.

You see this in rust, where they replaced the hash tables many years ago, the channel a couple of years ago, and most recently the sort implementations for both stable and unstable sort. I expect other languages / runtimes do similar things over time as well as CPUs change and new approaches are discovered.

I wouldn't. This is very specialized to the type of the elements.

Some languages, such as C++, allow for specialisation via templates and compile time evabulation (constexpr). It would be possible to detect when the size of the data type matches one of the integer types, is a POD, is comparable via memcmp, etc to use SIMD optimised algorithms.

It is looking like C++ 26 will get compile time reflection, which would make things like this even more feasible.

This is a drop-in improvement for essentially any binary search over 16-bit integer members.

With "practically benefit" I meant a speedup that is noticable. Is there any software that is significantly bottlenecked by the speed of sorted search?

I think it's possible to come up with a situation where you want to do a sorted search per every pixel in the screen, for every frame.

That sounds promising. I think ray tracing checks for ray intersection over an unsorted polygon soup. Sorted data seems hard to come by.