What about re-ranking? In my limited experience, adding fast+cheap re-ranking with something like Cohere to the query results took an okay vector based search and made top 1-5 results much stronger
What about re-ranking? In my limited experience, adding fast+cheap re-ranking with something like Cohere to the query results took an okay vector based search and made top 1-5 results much stronger
Reranking is definitely the way to go. We personally found common reranker models to be a little too opaque (can't explain to the user why this result was picked) and not quite steerable enough, so we just use another LLM for reranking.
We open-sourced our impl just this week: https://github.com/with-logic/intent
We use Groq with gpt-oss-20b, which gives great results and only adds ~250ms to the processing pipeline.
If you use mini / flash models from OpenAI / Gemini, expect it to be 2.5s-3s of overhead.
Query expansion works better.
Query expansion and re ranking can and often do coexist
Roughly, first there is the query analysis/manipulation phase where you might have NER, spell check, query expansion/relaxation etc
Then there is the selection phase, where you retrieve all items that are relevant. Sometimes people will bring in results from both text and vector based indices. Perhaps and additional layer to group results
Then finally you have the reranking layer using a cross encoder model which might even have some personalisation in the mix
Also, with vector search you might not need query expansion necessarily since semantic similarity does loose association. But every domain is unique and there’s only one way to find out
Query expansion happens before the retrieval query, reranking is applied after the ranked results are returned, both are important