Hacker News

This was essentially my response as well, but the other replies to you also have a point, and I think the key here is the 'Retrieval' in RAG is very vague, and depending on who you were and what you were getting into RAG for, the term means different things.

I am definitely more aligned with needing what I would rather call 'Deep Semantic Search and Generation' - the ability to query text chunk embeddings of... a 100k PDF's, using the semantics to search for the closeness of the 'ideas', those fed into the context of the LLM, and then the LLM generate a response to the prompt citing the source PDF(s) the closest matched vectors came from...

That is the killer app of a 'deep research' assistant IMO and you don't get that via just grepping words and feeding related files into the context window.

The downside is, how to generate embeddings of massive amounts of mixed-media files and store in a database quickly and cheaply compared to just grepping a few terms from said files? A CPU grep of text in files in RAM is like five orders of magnitude faster than an embedding model on the GPU generating semantic embeddings of the chunked file and then storing those for later.