I am so tired of these undifferentiated takes.
These types of articles regularly come from people who don't actually build SCALE systems with LLMs. Or, people who want to sell you on a new tech. And the frustrating thing is: They ain't even wrong.
Top-K RAG via vector search is not a sufficient solution. It never really was for most interesting use-cases.
Of course, take easiest and most structured - in a sense perfectly indexed - data (code repos) and claim that "RAG is dead". Again. Now try this with billions of unstructured tokens where the LLM really needs to do something with the entire context (like, confirm that something is NOT in the documents), where even the best LLM loses context coherence after like 64k tokens for complex tasks. Good luck!
The truth is: Whether its Agentic RAG, Graph RAG, or a combination of these with ye olde top-k RAG - it's still RAG. You are going to Retrieve, and then you are going to use a system of LLM agents to generate stuff with it. You may now be able to do the first step smarter. It's still Rag tho.
The latest Antrophic whoopsy showed that they also haven't solved the context rot issue. Yes you can get a 1M context scaled version of Claude, but then the small/detail scale performance is so garbage that misrouted customers loose their effin mind.
"My LLM is just gonna ripgrep through millions of technical doc pdfs identified only via undecipherable number-based filenames and inconsistent folder structures"
lol, and also, lmao
I agree. Permit me to rephrase. From this learning adventure https://www.infoq.com/articles/architecting-rag-pipeline/ I came to understand what many now call context rot. If you want quality answers, you still need relevance reranking and filtering no matter how big your context window becomes. Whether that happens in a search that is upfront in a one shot prompt or iteratively in a long session through an agentic system is merely an implementation detail.