Agentic search with a handful of basic tools (drawn from BM25, semantic search, tags, SQL, knowledge graph, and a handful of custom retrieval functions) blows the lid off RAG in my experience. The downside is it takes longer. A single “investigation” can easily use 20-30 different function calls. RAG is like a static one-shot version of this and while the results are inferior the process is also a lot faster.

Hey, I’m interested in what you call “agentic search”. Did you roll your own or are you using a set of integrated tools?

I’ve used LightRAG and looking to integrate it with OpenWebUI and possibly air weave which was a show HN earlier.

My data is highly structured and has references between documents, so I wanted to leverage that structure for better retrieval and reasoning.

Rolled my own in Python.

For graph/tree document representations, it’s common in RAG to use summaries and aggregation. For example, the search yields a match on a chunk, but you want to include context from adjacent chunks — either laterally, in the same document section, or vertically, going up a level to include the title and summary of the parent node. How you integrate and aggregate the surrounding context is up to you. Different RAG systems handle it differently, each with its own trade offs. The point is that the system is static and hardcoded.

The agentic approach is: instead of trying to synthesize and rank/re-rank your search results into a single deliverable, why not leave that to the LLM, which can dynamically traverse your data. For a document tree, I would try exposing the tree structure to the LLM. Return the result with pointers to relevant neighbor nodes, each with a short description. Then the LLM can decide, based on what it finds, to run a new search or explore local nodes.

I've found his hybrid approach pretty good for the majority of use cases. BM25 (maybe Splade if you want a blend of BOW/Keyword), + Vectors + RRF + re-rank works pretty damn well.

The trick that has elevated RAG, at least for my use cases, has been having different representations of your documents, as well as sending multiple permutations of the input query. Do as much as you can in the VectorDB for speed. I'll sometimes have 10-11 different "batched" calls to our vectorDB that are lightning quick. Then also being smart about what payloads I'm actually pulling so that if I do use the LLM to re-rank in the end, I'm not blowing up the context.

TLDR: Yes, you actually do have to put in significant work to build an efficient RAG pipeline, but that's fine and probably should be expected. And I don't think we are in a world yet where we can just "assume" that large context windows will be viable for really precise work, or that costs will drop to 0 anytime soon for those context windows.