It always annoys and amazes me that people in this field have no basic understanding that closed-world finite-information abstract games are a unique and trivial problem. So much of the so-called "world model" ideological mumbojumbo comes from these setups.
Sampling board state from an abstract board space isn't a statistical inference problem. There's no missing information.
The whole edifice of science is a set of experimental and inferential practices to overcome the massive information gap between the state of a measuring device and the state of what, we believe, it measures.
In the case of natural language the gap between a sequence of symbols, "the war in ukraine" and those aspects of the world these symbols refer to is enormous.
The idea that there is even a RL-style "reward" function to describe this gap is pseudoscience. As is the false equivocation between sampling of abstracta such as games, and measuring the world.
> [...] and trivial problem.
It just took decades and impressive breakthroughs to solve, I wouldn't really call it "trivial". However, I do agree with you that they're a class of problem different from problems with no clear objective function, and probably much easier to reason about that.
They're a trivial inference problem, not a trivial problem to solve as such.
As in, if i need to infer the radius of a circle from N points sampled from that cirlce.. yes, I'm sure there's a textbook of algorithms/etc. with a lot of work spent on them.
But in the sense of statistical inference, you're only learning a property of a distribution given that distribution.. there isn't any inferential gap. As N->inf, you recover the entire circle itself.
compare with say, learning the 3d structure of an object from 2d photographs. At any rotation of that object, you have a new pixel distribution. So in pixel-space a 3d object is an infinite number of distributions; and the inference goal in pixel-space is to choose between sets of these infinities.
That's actually impossible without bridging information (ie., some theory). And in practice, it isn't solved in pixel space... you suppose some 3d geometry and use data to refine it. So you solve it in 3d-object-property-space.
With AI techniques, you have ones which work on abstracta (eg., circles) being used on measurement data. So you're solving the 3d/2d problem in pixel space, expecting this to work because "objects are made out of pixels, arent they?" NO.
So there's a huge inferential gap that you cannot bridge here. And the young AI fantatics in research keep milling out papers showing that it does work, so long as its a cirlce, chess, or some abstract game.
Yes. Quantum mechanics for example is not something that could have been thought of even conceptually by anything “locked in a room”. Logically coherent structure space is so mind bogglingly big we will never come close to even the smallest fraction of it. Science recognizes that only experiments will bring structures like QM out of the infinite sea into our conceptual space. And as a byproduct of how experiments work, the concepts will match (model) the actual world fairly well. The armchair is quite limiting, and I don’t see how LLMs aren’t locked to it.
AGI won’t come from this set of tools. Sam Altman just wants to buy himself a few years of time to find their next product.
Forgive my naiveté here but even though solutions to those finite-information abstract games are trivial but not necessarily tractable(for a loser definition of tractable here) and we still need to build heuristics for the subclass of such problems where we need solutions in a given finite time frame. Those heuristics might not be easy to deduce and hence such models help in ascertaining those.
Yes, and this is how computer "scientists" think of problems -- but this isnt science, it's mathematics.
If you have a process, eg., points = sample(circle) which fully describes its target as n->inf (ie., points = circle as n->inf) you arent engaged in statistical infernece. You might be using some of the same formula, but the whole system of science and statistics has been created for a radically different problem with radically different semantics to everything you're doing.
eg., the height of mercury in a thermometer never becomes the liquid being measured.. it might seems insane/weird/obvious to mention this... but we literally have berkelian-style neoidealists in AI research who don't realise this...
Who think that because you can find representations of abstracta in other spaces they can be projected in.. that this therefore tells you anything at all about inference problems. As if it was the neural network algorithm itself (a series of multiplications and additions) that "revealed the truth" in all data given to it. This, of course, is pseudoscience.
It only applies on mathematical problems, for obvious reasons. If you use a function approximation alg to approximate a function, do not be suprised you can succeed. The issue is that the relationship between, say, the state of a theremometer and the state of the temperature of it's target system is not an abstract function which lives in the space of temperature readings.
More precisely, in the space of temperature readings the actual causal relationship between the height of the mecurary and the temperature of the target shows up as an infinite number of temperature distributions (with any given trained NN learning only one of these). None of which are a law of nature -- laws of nature are not given by distributions in measuring devices.
Who doesn’t? Karpathy, and a pretty much every researcher at OpenAI/Deepmind/FAIR absolutely knows the trivial concept of fully observable versus partially observable environments, which is 101 reinforcement learning.
Many don't understand it as a semantic difference
ie., that when you're taking data from a therometer in order to estimate the temperature of coffee, the issue isnt simply partial information
Its that the information is about the mercury, not the coffee. In order to bridge the two you need a theory (eg., about the causal reliability of heating / room temp / etc.)
So this isnt just a partial/full information problem -- these are still mathematical toys. This is a reality problem. This is a you're dealing with a causal realtionship between physical systems problem. This is not a mathematical relationship. It isnt merely partial, it is not a matter of "informaton" at all. No amount could ever make the mecurary, coffee.
Computer scientists have been trained on mathematics and deployed as social scientists, and the naiveté is incredible