> An interesting alternative I've been meaning to try out is inverting this flow.

This is what I am doing with my AI Search Assistant feature, which I discuss in more detail via the link below:

https://github.com/gitsense/chat/blob/main/packages/chat/wid...

By default, I provide what I call a "Tiny Overview Analyzer". You can read the prompt for the Analyzer with the link below:

https://github.com/gitsense/chat/blob/main/packages/chat/wid...

In a nutshell, it generates a very short summary of every document along with keywords. The basic idea is to use BM25 ranking to identify the most relevant documents for the AI to review. For example, my use case is to understand how Aider, Claude Code, etc., store their conversations so that I can make them readable in my chat app. To answer this, I would ask 'How does Aider store conversations?' and the LLM would construct a deterministic keyword search using terms that would most likely identify how conversations are stored.

Once I have the list of files, the LLM is asked again to review the summaries of all matches and suggest which documents should be loaded in full for further review. I've found this approach to be inconsistent, however. What I've found to work much better is just loading the "Tiny Overview" summaries into context and chatting with the LLM. For example, I would ask the same question: "Which files do you think can tell me how Aider stores conversations? Identify up to 20 files and create a context bundle for them so I can load them into context." For a thousand files, you can easily fit three-sentence summaries for each of them without overwhelming the LLM. Once I have my answer, I just need a few clicks to load the files into context, and then the LLM will have full access to the file content and can better answer my question.