Looks good! There's an important thing missing from the benchmarks though:

- cpu usage under concurrency: many of these spin-lock or use atomics, which can use up to 100% cpu time just spinning.

- latency under concurrency: atomics cause cache-line bouncing which kills latency, especially p99 latency

I don't write Go but respect to the author for trying to list trade-off considerations for each of the implementations tested, and not just proclaim their library the overal winner.

Will we also eventually get a generic sync.Map?

Almost certainly, since the internal HashTrieMap is already generic. But for now this author's package stands in nicely.