Hacker News

A suspicious lack of any performance metrics on the many standard RAG/QA benchmarks out there, except for their highly fine-tuned and dataset-specific MAFIN2.5 system. I would love the see this approach vs. a similarly well-tuned structured hybrid retriever (vector similarity + text matching) which is the common way of building domain-specific RAG. The FinanceBench GPT4o+Search system never mentions what the retrieval approach is [1,2], so I will have to assume it is the dumbest retriever possible to oversell the improvement.

PageIndex does not state to what degree the semantic structuring is rule-based (document structure) or also inferred by an ML model, in any case structuring chunks using semantic document structure is nothing new and pretty common, as is adding generated titles and summaries to the chunk nodes. But I find it dubious that prompt-based retrieval on structured chunk metadata works robustly, and if it does perform well it is because of the extra work in prompt-engineering done on chunk metadata generation and retrieval. This introduces two LLM-based components that can lead to highly variable output versus a traditional vector chunker and retriever. There are many more knobs to tune in a text prompt and an LLM-based chunker than in a sentence/paragraph chunker and a vector+text similarity hybrid retriever.

You will have to test retrieval and generation performance for your application regardless, but with so many LLM-based components this will lead to increased iteration time and cost vs. embeddings. Advantage of PageIndex is you can make it really domain-specific probably. Claims of improved retrieval time are dubious, vector databases (even with hybrid search) are highly efficient, definitely more efficient that prompting an LLM to select relevant nodes.

1. https://pageindex.ai/blog/Mafin2.5 2. https://github.com/VectifyAI/Mafin2.5-FinanceBench