>"Retrieval based on reasoning — say goodbye to approximate semantic search ("vibe retrieval"

How is this not precisely "vibe retrieval" and much more approximate, where approximate in this case is uncertainty over the precise reasoning?

Similarity with conversion to high-dimensional vectors and then something like kNN seems significantly less approximate, less "vibe" based, than this.

This also appears to be completely predicated on pre-enrichment of the documents by adding structure through API calls to, in the example, openAI.

It doesn't at all seem accurate to:

1: Toss out mathematical similarity calculations

2: Add structure with LLMs

3: Use LLMs to traverse the structure

4: Label this as less vibe-ish

Also for any sufficiently large set of documents, or granularity on smaller sets of documents, scaling will become problematic as the doc structure approaches the context limit of the LLM doing the retrieval.

I work in this field, so I can answer.

Embeddings are great at basic conceptual similarity, but in quality maximalist fields and use cases they fall apart very quickly.

For example:

"I want you to find inconsistencies across N documents." There is no concept of an inconsistency in an embedding. However, a textual summary or context stuffing entire documents can help with this.

"What was John's opinion on the European economy in 2025?" It will find a similarity to things involving the European economy, including lots of docs from 2024, 2023, etc. And because of chunking strategies with embeddings and embeddings being heavily compressed representations of data, you will absolutely get chunks from various documents that are not limited to 2025.

"Where are Sarah or John directly quoted in this folder full of legal documents?" Sarah and John might be referenced across many documents, but finding where they are directly quoted is nearly impossible even in a high dimensional vector.

Embeddings are awesome, and great for some things like product catalog lookups and other fun stuff, but for many industries the mathematical cosign similarity approach is just not effective.

> Embeddings are great at basic conceptual similarity, but in quality maximalist fields and use cases they fall apart very quickly.

This makes a lot of sense if you think about it. You want something as conceptually similar to the correct answer as possible. But with vector search, you are looking for something conceptually similar to some formulation of the question, which has some loose correlation, but is very much not the same thing.

There's ways you can prepare data to try to get a closer approximation (e.g. you can have an LLM formulate for each indexed block questions that it could answer and index those, and then you'll be searching for material that answers a question similar to the question being asked, which is a bit closer to what you want, but its still an approximation.

But if you ahead of time know from experience salient features of the dataset that are useful for the particular application, and can index those directly, it just makes sense that while this will be more labor intensive than generalized vector search and may generalize less well outside of that particular use case, it will also be more useful in the intended use case in many places.

Yes, sure vector similarity has limits, but does this address PageIndex's approach to those limits? I mean, beyond the approach of "Add structure with recursive LLM API calls, show LLM that structure to search". I don't see where PageIndex is doing more than this.

It is just as "vibe-ish" as vector search and notably does require chunking (document chunks are fed to the indexer to build the table of contents). That said, I don't find vector search any less "vibey". While "mathematical similarity" is a structured operation, the "conversion to high-dimensional vectors" part is predicated on the encoder, which can be trained towards any objective.

    > scaling will become problematic as the doc structure approaches the context limit of the LLM doing the retrieval
IIUC, retrieval is based on traversing a tree structure, so only the root nodes have to fit in the context window. I find that kinda cool about this approach.

But yes, still "vibe retrieval".

It doesn't look like it's just root nodes from the structure, it appears to be the entire structure including a summary and excluding the text content itself:

    {json.dumps(tree_without_text, indent=2)}
The end result is that a pre-summarized digest is input in each prompt, the LLM selects whatever it decides on.

The pageIndex value add here is ostensibly the creation of that summary structure, but this too is done with LLM assistance. I've been through the code now, and what I see is essentially JSON creation and parsing during the index process that has LLM prompts as the creation engine for all of that as well.

Yes, it is technically vectorless-RAG, but it gets there completely and totally with iterative and recursive calls to an LLM on all sides.

Looking through the rest of their code & API, the API exists to do these things:

    1: Create your ToC using unsupervised[1]  LLM calls.
    2: Serve your ToC to an LLM when searching or querying your doc base
    3: Be your document store to return hits from #2
[1] Unsupervised in the ML sense, not as a value/quality judgement.

> This also appears to be completely predicated on pre-enrichment of the documents by adding structure through API calls to, in the example, openAI.

That was my immediate take. [Look at the summary and answer based on where you expect the data to be found] maybe works well for reliably structured data.

[dead]