> If you have priors about the data distribution, then it's possible to design algorithms which use that extra information to perform MUCH better.

You don't even need priors. See interpolation search, where knowing the position and value of two elements in a sorted list already allows the search to make an educated guess about where the element it's searching for is by estimating the likely place it would be by interpolating the elements.

> knowing the position and value of two elements in a sorted list

That's a prior about the distribution, if a relatively weak one (in some sense, at least).

This relies on knowledge of the distribution, just querying in the middle of A = [1, 2, 4, 8, 16, ..., 2^(n-1)] is slower than binary search

> just querying in the middle

It's an interpolation search. You interpolate the values you evaluated by whatever method you'd like. No one forces you to do linear interpolation. You can very easily fit a quadratic polynomial with the last 3 points, for example.

Interpolation search seems to have a convergence rate of log log n. That's pretty efficient.