Indeed. Dabbling in 'RAG' (which for better or worse has become a tag for anything context retrieval) for more complex documentation and more intricate questions, you will very quickly realize that you really need to go far beyond simple 'chunking', and end up with a subsystem that constructs more than one very intricate knowledge graphs for supporting different kinds of questions the users might ask. For example: a simple question such as "What exactly is an 'Essential Entity'? is better handled by Knowledge Representation A as opposed to "Can you provide a gap and risk analysis on my 2025 draft compliance statement (uploaded) in light of the current GDPR, NIS-2 and the AI Act?"
(My domain is regulatory compliance, so maybe this goes beyond pure documentation but I'm guessing pushed far enough the same complexities arise)
“It’s just a chat bot Michael, how much can it cost?”
A philosophy degree later…
I ended up just generating a summary of each of our 1k docs, using the summaries for retrieval, running a filter to confirm the doc is relevant, and finally using the actual doc to generate an answers.
This is sort of hilarious; to use an LLM as a good search interface first build.. a search engine.
I guess this is why Kagi Quick Answer has consistently been one of the best AI tools I use. The search is good, so their agent is getting the best context for the summaries. Makes sense.
It is building a system that amplifies the strengths of the LLM by feeding it the right knowledge in the right format at inference time. Context design is both a search (as a generic term for everything retrieval) and a representation problem.
Just dumping raw reams of text into the 'prompt' isn't the best way to great results. Now I am fully aware that anything I can do on my side of the API, the LLM provider can and eventually will do as well. After all, Search also evolved beyond 'pagerank' to thousands of specialized heuristic subsystems.