Looking at first optimization, I wonder if double-checking after acquiring exclusive lock brings any performance benefits. The whole premise is that cache access is read-heavy, so not acquiring exclusive locks for reads eliminates by far the biggest problem.
Rare (I presume) cases of overlapping updates from different threads (considering updates themselves are also infrequent) don't seem like a big deal compared to lock elimination. Would be interesting to see benchmark numbers for those optimizations separately.