Isn't grep + LLM a form of RAG anyway?

Yes, this guy's post came up on my LinkedIn. I think it's helpful to consider the source in these types of articles, written by a CEO at a fintech startup (looks like AI generated too). It's obvious from reading the article that he doesn't understand what he's talking about and has likely never created any kind of RAG or other system. He has a very limited experience, basically a single project, of building a system around rudimentary ingestion of SEC filings, that's his entire breath of technical experience on the subject. So take what you read with a grain of salt, and do your own research and testing.

It really depends on what you mean by RAG. If you take the acronym at face value yeah.

However, RAG has been used as a stand in for a specific design pattern where you retrieve data at the start of a conversation or request and then inject that into the request. This simple pattern has benefits compared to just using sending a prompt by itself.

The point the author is trying to make is that this pattern kind of sucks compared to Agentic Search, where instead of shoving a bunch of extra context in at the start you give the model the ability to pull context in as needed. By switching from a "push" to a "pull" pattern, we allow the model to augment and clarify the queries it's making as it goes through a task which in turn gives the model better data to work with (and thus better results).

I guess, but with a very basic form of exact match retreival. The embedding based RAG tries to augment the prompt with extra data that is semantically similar instead of just exactly same.

Yeah 100%

Almost all tool calls would result in rag.

Rag is dead just means rolling out your own search and manually injecting results into context is dead (just use tools). It means the chunking techniques are dead.

Chunking is still relevant, because you want your tool calls to return results specific to the needs of the query.

If you want to know "how are tartans officially registered" you don't want to feed the entire 554kb wikipedia article on Tartan to your model, using 138,500 tokens, over 35% of gpt-5's context window, with significant monetary and latency cost. You want to feed it just the "Regulation>Registration" subsection and get an answer 1000x cheaper and faster.

but you could. For that example, you could just use a much cheaper model since it's not that complicated a question, and just pass the entire article. Just use gemini flash for example. Models will only get cheaper and context windows only get bigger

I've seen it called "agentic search" while RAG seems to have become synonymous with semantic search via embeddings

That's a silly distinction to make, because there's nothing stopping you from giving an agent access to a semantic search.

If I make a semantic search over my organization's Policy As Code procedures or whatever and give it to Claude Code as an MCP, does Claude Code suddenly stop being agentic?

Well yeah RAG just specifies retrieval augmented, not that vector retrieval or decoder retrieval was used