In RAG, you operate on embeddings and perform vector search, so if you search for fat lady, it might also retrieve text like huge queen, because they're semantically similar. Grep on the other hand, only matches exact strings, so it would not find it.
Sure, but vector search is the dominant form of RAG, the rest are niche. Saying "RAG doesn’t have to use vectors" is like saying "LLMs don't have to use transformers". Technically true, but irrelevant when 99% of what's in use today does.
How are they niche? The default mode of search for most dedicated RAG apps nowadays is hybrid search that blends classical BM-25 search with some HNSW embedding search. That's already breaking the definition.
A search is a search. The architecture doesn't care if it's doing an vector search or a text search or a keyword search or a regex search, it's all the same. Deploying a RAG app means trying different search methods, or using multiple methods simultaneously or sequentially, to get the best performance for your corpus and use case.
Most hybrid stacks (BM25 + dense via HNSW/IVF) still rely on embeddings as a first class signal. So in practice the vector side carries recall on paraphrase/synonymy/OOO vocab, while BM25 stabilizes precision on exact term and short doc cases. So my point still stands.
> The architecture doesn't care
The architecture does care because latency, recall shape, and failure modes differ.
I don't know of any serious RAG deployments that don't use vectors. I'm referring to large scale systems, not hobby projects or small sites.
RAG means any kind of data lookup which improves LLM generation results. I work in this area and speak to tons of companies doing RAG and almost all these days have realised that hybrid approaches are way better than pure vector searches.
Standard understanding of RAG now is simply adding any data to the context to improve the result.
In RAG, you operate on embeddings and perform vector search, so if you search for fat lady, it might also retrieve text like huge queen, because they're semantically similar. Grep on the other hand, only matches exact strings, so it would not find it.
R in RAG is for retrieval… of any kind. It doesn’t have to be vector search.
Sure, but vector search is the dominant form of RAG, the rest are niche. Saying "RAG doesn’t have to use vectors" is like saying "LLMs don't have to use transformers". Technically true, but irrelevant when 99% of what's in use today does.
How are they niche? The default mode of search for most dedicated RAG apps nowadays is hybrid search that blends classical BM-25 search with some HNSW embedding search. That's already breaking the definition.
A search is a search. The architecture doesn't care if it's doing an vector search or a text search or a keyword search or a regex search, it's all the same. Deploying a RAG app means trying different search methods, or using multiple methods simultaneously or sequentially, to get the best performance for your corpus and use case.
Most hybrid stacks (BM25 + dense via HNSW/IVF) still rely on embeddings as a first class signal. So in practice the vector side carries recall on paraphrase/synonymy/OOO vocab, while BM25 stabilizes precision on exact term and short doc cases. So my point still stands.
> The architecture doesn't care
The architecture does care because latency, recall shape, and failure modes differ.
I don't know of any serious RAG deployments that don't use vectors. I'm referring to large scale systems, not hobby projects or small sites.
This isn't the case.
RAG means any kind of data lookup which improves LLM generation results. I work in this area and speak to tons of companies doing RAG and almost all these days have realised that hybrid approaches are way better than pure vector searches.
Standard understanding of RAG now is simply adding any data to the context to improve the result.