I've been intrigued about the thing they call the "data product" for a long time: the fact that there's a sort of equivalence between precomputing position analysis and doing no runtime analysis vs knowing nothing and doing everything at runtime. It is a general property of a lot of algorithms that you can precompute various amounts of computation in order to reduce runtime complexity. It's also the difference between "telling an LLM to do a task" and "telling an LLM to write a bunch of code to do a task", which are also the two ends of a spectrum.
Especially interesting is the fact that the optimal strategy, where on the spectrum you go, is affected by the effectiveness of your algorithms on each side. The more efficiently you can cognitively compress precomputed decisions, the more it makes sense to precompute; the more efficiently you can apply techniques for move selection at runtime. And the two interact a lot: for instance in chess, there are simple heuristics for a lot of endgames, meaning that the state space you explore at runtime can terminate as soon as it gets to an endgame that you have memorized a solution for.
I wonder if anyone knows anyone examining this phenomenon in generality?
isn't this very similar or a case of trading space for time or vice versa (e.g in algorithm analysis)
That is certainly an example of it, but I feel like what I'm talking about is more general. In algorithms you tend to be able to say concretely exactly how much space vs time you're trading off. But in a game like Connect 4 or chess it's a lot harder to say exactly how much you gain from, say, adding one more heuristic to your board analysis, or memorizing one more opening subtree. I don't know how to think about that mathematically, and I would like to. It's related, I guess, to what the OP says about finding a graph which is information-theoretically small rather than graph-theoretically small.
Generally it's interesting to contemplate that there may exist heuristics which no one has thought of yet that dramatically improve your overall performance. Actually I thought of an example of this. When AlphaZero and Leela (the ML-based chess engines) started beating Stockfish (the preeminent graph-search chess engine), one of the early things people noticed was that they loved to push their a- and h- pawns down the board, which was, I guess, a sorta passive move that people had systematically undervalued before (cf [1]; I remember seeing a good youtube video about it also but I can't recall what it was). Which means that just having this heuristic was worth some amount of ELO the whole time, just, no one had realized it! Presumably the optimal play for a human in chess, whose mental capacity is limited, is a certain mix of memorizing fixed positions vs heuristics, and the interactions between those are what's interesting.
[1]: https://www.reddit.com/r/chess/comments/ljp151/why_is_h4_pla...
I’ll throw another example.
Do the work of indexing, or summarization or compression vs just do it retrieval on the raw records.
It could be embeddings, or presorts or indices.
But I think the parent poster is correct. There’s an inherent write vs read tradeoff as well as memory vs compute. Maybe with some generalization vs specialization.
Idk if im explaining properly. But you can always front load more effort on a narrower problem space and then utilize it on that space downstream.
I think with your llm example there is no code/data split. The llm is caching a solution to a narrow problem by writing the code.