Pick an embedding model that supports binary quantization and then use a SIMD-optimized Hamming Distance function. I'm doing this for Scour and doing about 1.6 billion comparisons per second.
Pick an embedding model that supports binary quantization and then use a SIMD-optimized Hamming Distance function. I'm doing this for Scour and doing about 1.6 billion comparisons per second.