So is the SIMD the magic piece here, or is it the interpolation search? If the data is evenly distributed, that is pretty optimal for the interpolation search..

In the Intel CPU + cold cache case, the quad search matters. In the other three cases, only the SIMD matters.

To put it another way: this is addressed in the article.