I guess it matters if you have to do lookup in a tight loop. If you do this occasionally, I think it's not worth it, especially for complex objects with custom comparators. The algorithm is still O(log(n)) just a more advanced "divide and conquer" with smaller constant.

I would expect the standard library of various languages to provide an optimised implementation such as this. Then everyone downstream benefits, and benefits from future improvements when compiled for a newer version of the language / executed under a newer version of the runtime.

You see this in rust, where they replaced the hash tables many years ago, the channel a couple of years ago, and most recently the sort implementations for both stable and unstable sort. I expect other languages / runtimes do similar things over time as well as CPUs change and new approaches are discovered.

I wouldn't. This is very specialized to the type of the elements.

Some languages, such as C++, allow for specialisation via templates and compile time evabulation (constexpr). It would be possible to detect when the size of the data type matches one of the integer types, is a POD, is comparable via memcmp, etc to use SIMD optimised algorithms.

It is looking like C++ 26 will get compile time reflection, which would make things like this even more feasible.