Actually, chunking isn't such a bad problem with code, it chunks itself, and code embeddings produce better results. The problem is that RAG is fiddly, and people try to just copy a basic template or use a batteries included lib that's tuned to QA, which isn't gonna produce good results.

> Actually, chunking isn't such a bad problem with code, it chunks itself, and code embeddings produce better results.

I can't remember what post I read this in (but it was on Hacker News) and I read when designing Claude Code, they (Anthropic) tried a RAG approach but it didn't work very well compared to loading in the full file. If my understanding of how Claude Code works is correct (this was based on comments from others), was it "greps like a intern/junior developer". So what Claude Code does (provided grep is the key), is it would ask Sonnet for keywords to grep for based on the users query. And it would continuously revise the grep key words until it was satisfied with the files that it found.

As ridiculous as this sounds, this approach is not horrible, albeit very inefficient. For my approach, I focus on capturing intent which is what grep can't match. And for RAG, if the code is not chunked correctly and/or if the code is just badly organized, you may miss the true intent for the code.

Oh yeah, loading in full files when possible is great. I use Gemini pro to look at bundles of my whole codebase, the level of comprehension it gets from that is pretty shocking.

This is why I think Vector DBs are probably not going to be used for a lot of applications in the future. It served a very valid purpose when context windows were a lot smaller and LLMs were not as good, but moving forward, I personally think it makes less and less sense.

Vector DBs will still be around to do a first pass before feeding data in to a long context reasoner like Gemini in most cases. The thing that's going to go away is rerankers.