> You want predictable, linear pricing of course, but sometimes you want to simply be able to get a predictably better response by throwing a bit more money/compute time at it.

Through more thorough ANN vector search / higher recall, or would it also require different preprocessing?

Honestly I don’t know the best answer, but my sense is there’s something important in the direction the OP is going: I.e moving away from vector search or preprocessing towards dynamic exploration of the document space by an agent. Ultimately, if the content in one’s corpus develops in a linear manner (things build one after another), no vector search will ever work on its own, since you just get a however exhaustive list of every passage directly relevant to the question — but not how those relate to all the text before or after.

GraphRAG gets around this by preprocessing these “narrative” summaries of pretty much every combination of topics in a document: vector search then returns a combination of individual topic descriptions, relations between topic descriptions, raw excerpts from the data, and then such overarching “narratives.” This definitely works pretty well in general, but a lot of the narratives turn out to be pretty useless for the important questions and it’s expensive for preprocessing etc.

I think the area that hasn’t been explored enough is generating these narratives dynamically, ie more or less as the OP does having the agent simulate reading through every document with a question in mind and a log of possibly relevant issues. Obviously that’s expensive per query, but if you can get the right answer to an important question for less than the cost of a human’s time it’s worth it. GraphRAG preprocessing costs a lot (exponentially scales with data) and that cost doesn’t guarantee a good answer to any particular question.