So, the hidden mental model that the OP is expressing and failed to elucidate on is that llm’s can be thought of as compressing related concepts into approximately orthogonal subspaces of the vector space that is upper bounded by the superposition of all of their weights. Since training has the effect of compressing knowledge into subspaces, a necessary corollary of that fact is that there are now regions within the vector space that contain nothing very much. Those are the valleys that need to be tunneled through, ie the model needs to activate disparate regions of its knowledge manifold simultaneously, which, seems like it might be difficult to do. I’m not sure if this is a good way of looking at things though, because inference isn’t topology and I’m not sure that abstract reasoning can be reduced down to finding ways to connect concepts that have been learned in isolation.
Not the OP, but my interpretation here is that if you model the replies as some point in a vector space, assuming points from a given domain cluster close to each other, replies that span two domains need to "tunnel" between these two spaces.
Sometimes things that look very different actually are represented with similar vectors in latent space.
When that happens to us it "feels like" intuition; something you can't really put a finger on and might require work to put into a form that can be transferred to another human that has a different mental model
Yes, that also happens, for example when someone first said natural disasters are not triggered by offending gods. It is all about making explanations as simple as possible but no simpler.
So, the hidden mental model that the OP is expressing and failed to elucidate on is that llm’s can be thought of as compressing related concepts into approximately orthogonal subspaces of the vector space that is upper bounded by the superposition of all of their weights. Since training has the effect of compressing knowledge into subspaces, a necessary corollary of that fact is that there are now regions within the vector space that contain nothing very much. Those are the valleys that need to be tunneled through, ie the model needs to activate disparate regions of its knowledge manifold simultaneously, which, seems like it might be difficult to do. I’m not sure if this is a good way of looking at things though, because inference isn’t topology and I’m not sure that abstract reasoning can be reduced down to finding ways to connect concepts that have been learned in isolation.
Not the OP, but my interpretation here is that if you model the replies as some point in a vector space, assuming points from a given domain cluster close to each other, replies that span two domains need to "tunnel" between these two spaces.
A hallmark of intelligence is the ability to find connections between the seemingly disparate.
That's also a hallmark of some mental/psychological illnesses (paranoid schizophrenia family) and use of certain drugs, particularly hallucinogens.
The hallmark of intelligence in this scenario is not just being able to make the connections, but being able to pick the right ones.
The word "seemingly" is doing a lot of work here.
Sometimes things that look very different actually are represented with similar vectors in latent space.
When that happens to us it "feels like" intuition; something you can't really put a finger on and might require work to put into a form that can be transferred to another human that has a different mental model
Actually, a hallmark could be to prune illusory connections, right? That would decrease complexity rather than amplifying it.
Yes, that also happens, for example when someone first said natural disasters are not triggered by offending gods. It is all about making explanations as simple as possible but no simpler.
Does this make conspiracy theorists highly intelligent?
No, but they emulate intelligence by making up connections between seemingly disparate things, where there are none.
They make connections but lack the critical thinking skills to weed out the bad/wrong ones.
Which is why, just occasionally, they're right, but mostly by accident.