The best way to store information depends on how you intend to use (query) it.
The query itself represents information. If you can anticipate 100% of the ways in which you intend to query the information (no surprises), I'd argue there might be an ideal way to store it.
This is connected to the equivalence relationship between optimal indexing and optimal AGI. The "best" way is optimal for the entire universe of possible queries but has the downside of being profoundly computationally intractable.
Requiring perfect knowledge of how information will be used is brittle. It has the major benefit of making the algorithm design problem tractable, which is why we do it.
An alternative approach is to exclude large subsets of queries from the universe of answerable queries without enumerating the queries that the system can answer. The goal is to qualitatively reduce the computational intractability of the universal case by pruning it without over-specifying the queries it can answer such as in the traditional indexing case. This is approximately what "learned indexing" attempts to do.
This is exactly right, and the article is clickbait junk.
Given the domain name, I was expecting something about the physics of information storage, and some interesting law of nature. Instead, the article is a bad introduction to data structures.
You both are affirming the title of the article.
"No single best way", meaning "it depends."
But don't let something like literacy get in the way of a opportunity to engage in meaningless outrage.
This line of thought works for storage in isolation, but does not hold up if write speed is a concern.
Speed can always be improved. If a method is too slow, run multiple machines in parralel. Longevity is different as it cannot scale. A million cd burners are together very fast, but the CDs wont last any longer. So the storage method is is the more profound tech problem.
So long as (fast/optimal) real-time access to new data is not a concern, you can introduce compaction to solve both problems.
> (fast/optimal) real-time access to new data
https://en.wikipedia.org/wiki/Optimal_binary_search_tree#Dyn...
as a line of thought, it totally does. you just extend the workload description to include writes. where this get problematic is that the ideal structure for transactional writes is nearly pessimal from a read standpoint. which is why we seem to end up doubling the write overhead - once to remember and once to optimize. or highly write-centric approach like LSM
I'd love to be clued in on more interesting architectures that either attempt to optimize both or provide a more continuous tuning knob between them
Yes with the important caveat that a lot of the time people don't have a crystal ball, can't see the far future, don't know if their intents will materialise in practice 12 months down the line and should therefore store information in Postures until that isn't a feasible option any more.
A consequence of there being no generally superior storage mechanism is that technologists as a community should have an agreed default standard for storage - which happens to be relational.
What if the various potential queries demand different / conflicting compression schemes?
I'd say this is spiritually what the no-free-lunch theorems are about... Because whatever "AI model" / query system you build -- it is implicitly biased towards queries coming from one slice of futures.