Hacker News

More than that, adding longer context isn’t free either in time or money. So filling an LLM context with k=100 documents of mixed relevance may be slower than reranking and filling with k=10 of high relevance.

Of course, the devil is in the details and there’s five dozen reasons why you might choose one approach over the other. But it is not clear that using a reranker is always slower.