I did little experiments with search in small arrays (16-32 items) and binary search is one of the worst methods because it requires lot of branches. The fastest method for small arrays was linear branchless search (you walk over all elements without breaking out of the loop. For example, if you want to know whether the array contains a number, you logically OR the checks for all items). I didn't use SIMD though, but the branches are very expensive for small arrays and simply checking all elements without branching is faster.
I wonder if this is faster because it makes the prefetcher happy.