Hacker News

The ultimate bottleneck in any search application is IOPS; how much data can you get off disk to compare within a tolerable time span.

Embeddings are huge compared to what you need with FTS, which generally has good locality, compresses extremely well, and permits sub-linear intersection algorithms and other tricks to make the most of your IOPS.

Regardless of vector size, you are unlikely to get more than one embedding per I/O operation with a vector approach. Even if you can fit more vectors into a block, there is no good way of arranging them to ensure efficient locality like you can with e.g. a postings list.

Thus off a 500K IOPS drive, given a 100ms execution window, your theoretical upper bound is 50K embeddings ranked, assuming actual ranking takes no time and no other disk operations are performed and you have only a single user.

Given you are more than likely comparing multiple embeddings per document, this carriage turns to a pumpkin pretty rapidly.