That's fair, but how do you grep down to the right 100-200 documents from millions without semantic understanding? If someone asks "What's our supply chain exposure?" grep won't find documents discussing "vendor dependencies" or "sourcing risks."

You could expand grep queries with synonyms, but now you're reimplementing query expansion, which is already part of modern RAG. And doing that intelligently means you're back to using embeddings anyway.

The workflow works great for codebases with consistent terminology. For enterprise knowledge bases with varied language and conceptual queries, grep alone can't get you to the right candidates.

the agent greps for the obvious term or terms, reads the resulting documents, discovers new terms to grep for, and the process repeats until its satisfied it has enough info to answer the question

> You could expand grep queries with synonyms, but now you're reimplementing query expansion, which is already part of modern RAG.

in this scenario "you" are not implementing anything - the agent will do this on its own

this is based on my experience using claude code in a codebase that definitely does not have consistent terminology

it doesn't always work but it seemed like you were thinking in terms of trying to get things right in a single grep when it's actually a series of greps that are informed by the results of previous ones

Classical search

Which is RAG. How you decide to take a set of documents to large for an LLM context window and narrow it down to a set that does fit is an implementation issue.

The chunk, embed, similarity search method was just a way to get a decent classical search pipeline up and running with not too much effort.