Yep you are right, also: quantization is a big issue here. For instance int8 quantization has minimal effects on recall, but makes dot-product much faster among vectors, and speedups things a lot. Also the number of components in the vectors make a huge difference. Another thing I didn't mention is that for instance Redis implementation (vector sets) is threaded, so the numbers I reported is not about a single core. Btw I agree with your comment, thank you. What I wanted to say is simply that the results you get, and the results I get, are not "out of this world", and are very credible. Have a nice day :)
Appreciate the thoughtful breakdown—you're absolutely right that quantization, dimensionality, and threading all play a big role in performance numbers. Thanks for the kind words and for engaging in the discussion. Wishing you a happy Year of the Horse—新春快乐,马年大吉!