Their tool code use makes a lot of sense, but I don’t really get their tool search approach.
We originally had RAG as a form of search to discover potentially relevant information for the context. Then with MCP we moved away from that and instead dumped all the tool descriptions into the context and let the LLM decide, and it turned out this was way better and more accurate.
Now it seems like the basic MCP approach leads to the LLM context running out of memory due to being flooded with too many tool descriptions. And so now we are back to calling search (not RAG but something else) to determine what’s potentially relevant.
Seems like we traded scalability for accuracy, then accuracy for scalability… but I guess maybe we’ve come out on top because whatever they are using for tool search is better than RAG?