I redesigned the protocol by which the Mercurial DVCS discovers the common DAG subset between the client and the server.

Firstly, my approach ("set discovery") was simply to take relatively dumb samples of nodes from the leaves towards roots and ask the other party if they knew these nodes, and then iteratively refine with more roundtrips. In practice, this by far beat the previous sophisticated approach ("tree discovery") which tries to use the structure of the DAG to cleverly select "highly informative" nodes.

Secondly, I had a symmetric setup where the client sent samples to the server, and the server responded with information about those samples, and samples of its own. It worked great, saving sometimes 100-eds of network roundtrips. However, computing the samples is relatively expensive. Another contributor suggested that it would work almost as well if the server was kept dumb and would just respond for each sample node whether it knew it or not. This massively reduced server load and kept the protocol much simpler.

https://repo.mercurial-scm.org/hg/file/tip/mercurial/setdisc... https://repo.mercurial-scm.org/hg/rev/cb98fed52495