I work at an AI startup, and we've explored a solution where we preprocess documents to make a short summary of each document, then provide these summaries with a tool call instruction to the bot so it can decide which document is relevant. This seems to scale to a few hundred documents of 100k-1m tokens, but then we run into issues with context window size and rot. I've thought about extending this as a tree based structure, kind of like an LLM file system, but have other priorities at the moment.
Embeddings had some context size limitations in our case - we were looking at large technical manuals. Gemini was the first to have a 1m context window, but for some reason its embedding window is tiny. I suspect the embeddings might start to break down when there's too much information.