Hacker News

Nonsense. The best you will ever do, even with full application knowledge and complete control of the machine, is an LRU cache replacement algorithm. But when you do it yourself you have to juggle the fine details of which indices to prioritize, and you will never get it perfect. If you're not running a dedicated machine, as soon as any other processes run all your careful tuning goes out the window.

Since LMDB manages multiple tables as a tree of trees, no fine tuning is needed. The internal paths to every hot page automatically take priority, regardless of which index or how large each index is. So a simpleminded LRU always makes optimal use of available cache, regardless of access pattern or other load on the system.

bagxrvxpepzn 5 hours ago [ - ]

First let me just say that while it's possible to interpret my original comment as uniquely applying to LMDB (or databases with similar page cache designs), in practice it applies to all general purpose databases including PostgreSQL and SQLite. This is because all general purpose databases will eventually fall short when it comes to tweaking behavior to meet application specific requirements, customizations notwithstanding. So to the extent that one should not use LMDB for anything that matters, one should also not use PostgreSQL or SQLite for anything that matters. If that corollary appears false in your frame of reference, then my statement about LMDB should also be false.

For high-stakes applications, you will have to maintain your own database code (either original or derived from an existing database) and that database code will need its own page caching layer (or a patched kernel), a generic page caching system (whether in-kernel with mmap or out of kernel) will not do. I acknowledge most applications don't operate in this regime.

> The best you will ever do, even with full application knowledge and complete control of the machine, is an LRU cache replacement algorithm.

This is not true. Applications often have specific high-priority data which should always exist in memory. That may be a moot point because you can do mlock() with mmap(). If we focus only on general-purpose caching, then even in that case there are many alternatives to LRU. SIEVE and ARC are two notable alternatives that perform significantly better for certain data. An application developer should be able to experiment with different general purpose caching strategies for different types of data, mmap() does not afford this.

Thank you Mr. Chu for your contributions to the technology commons and humanity in general.

quotemstr 5 hours ago [ - ]

> The best you will ever do, even with full application knowledge and complete control of the machine, is an LRU cache replacement algorithm

First of all, even the kernel can do better than simple LRU. We have MGLRU now for example. That said, the kernel is at a structural disadvantage.

A general purpose eviction and prefetch algorithm is like an automatic transmission on a car. It can react only to what it's seen.

When you drive stick, you can react to what you can see on the road ahead of you. A database has a query plan. It can see the future as well as remember the past. It has more information than the kernel.

> So a simpleminded LRU always makes optimal use of available cache, regardless of access pattern or other load on the system

That cannot be true. If I have a random access pattern, LRU will perform no better than random. If I have a future-oracle, I can just evict what's most distant in my set of future accesses.

Regardless of whether you're right about the suitability of LRU for this or that workload, it's simply false, mathematically, from a computer science POV, that LRU is optimal.

And if you go around making confidently wrong claims like this, one must wonder about what else you are wrong. If you want to be disagreeable in public, fine: just make sure you have math on your side first.

hyc_symas 4 hours ago [ - ]

In the time it takes for your query optimizer to dissect a query and "look ahead" LMDB would have already answered a million queries. You think your magical "future oracle" is zero cost? How many KLOCs is it? LMDB's hot paths fit entirely inside a CPU's L1 cache.