Hacker News

We are talking about cache friendliness here, including compact data representation.

B-trees algorithm requires leaf nodes to be filled up to some prespecified fill factor, usually from 50% to 70%. This is not compact by any measures.

Both LSM trees and COLA allow for much more compact storage. They also pretty cache friendly in the "cache oblivious" sense.

whizzter 3 days ago [ - ]

Read up more on the COLA paper and funnily enough inspired by LSM-tree's I implemented something alike "basic-cola" a few months back (the thing I mentioned for size-coding), add to end of array, hierarchically re-sort for each of the bits of the new size that turns 0 from bottom up (divisible by 2, sort last 2, divisible by 4, sort last 4 and so on) and search with O((log2(n))^2) complexity.

B-tree fill factors are a parameter that can be tuned to avoid extra splits depending on your insertion patterns, an in-memory variant doesn't need to be tuned like a disk-based variation.

Also nothing preventing the "bottom" of an LSM variation to be a dense B-tree like structure that just gets less updates.

The COLA paper also never mentions threading, an in-memory B-tree should scale better with size if there is multi-threading while I don't see an easy way to avoid big locks with COLA, maybe why the COLA paper was from 2000 and we haven't seen much additional development on it? (LSM-tree's work beautifully of course but then you basically have the double-space issue like with semi-space garbage collectors).

In the end, what you choose is dependent on your application and patterns, COLA is a clever structure that probably shines in some scenarios.

Like my scenario was a mostly heavy on consecutive insertions as well as lookups of potentially arbitrary compound keys, perfect for cola like since those are cheap.

thesz 2 days ago [ - ]

  > while I don't see an easy way to avoid big locks with COLA

Do not modify arrays in-place, create new arrays instead. This way you can have multiple readers as the data pretty much read-only, no write locks and, again, cache-oblivious merging (fast).

Extension of COLA called Fractal Indexes (on-disk storage) are commercialized by Tokutek: https://en.wikipedia.org/wiki/Fractal_tree_index