But a LLM shows similiar effects.
COCONUT, PCCoT, PLaT and co are directly linked to 'thinking in latent space'. yann lecun is working on this too, we have JEPA now.
Also how do you describe or explain how an LLM is generating the next token when it should add a feature to an existing code base? In my opinion it has structures which allows it to create a temp model of that code.
For sure a LLM lack the emotional component but what we humans also do, which indicates to me, that we are a lot closer to LLMs that we want to be, if you have a weird body feeling (stress, hot flashes, anger, etc.) your 'text area/llm/speech area' also tries to make sense of it. Its not always very good in doing so. That emotional body feeling is not that aligned with it and it takes time to either understand or ignore these types of inputs to the text area/llm/speech part of our brain.
I'm open for looking back in 5 years and saying 'man that was a wild ride but no AGI' but at the current quality of LLMs and all the other architectures and type of models and money etc. being thrown at AGI, for now i don't see a ceiling at all. I only see crazy unseen progress.
I don't understand what part of what I said you disagree with.
You state how you think and plan and have thoughts on how to do things etc. and i assumed you mention your way of thinking because you assume a LLM is not doing any of it.
I showed than counter examples.
I don't think you showed counter examples? Or can you link me to a paper which describes a language model thinking without predicting tokens?
My second sentence references all these papers:
"COCONUT, PCCoT, PLaT and co are directly linked to 'thinking in latent space'. yann lecun is working on this too, we have JEPA now."
And it does this thinking without producing tokens?
yes.
Btw. just because you have to do something with the LLM to trigger the flow of information through the model, doesn't mean it can't think. It only means that we have to build an architecture around the model or build it into the models base architecture to enable more thinking.
We do not know how the brain architecture is setup for this. We could have sub agents or we can be a Mixture of Experts type of 'model'.
There is also work going on in combining multimodal inputs and diffusion models which look complelty different from a output pov etc.
If you look how a LLM does math, Anthropic showed in a blog article, that they found similiar structures for estimating numbers than how a brain does.
Another experiment from a person was to clone layers and just adding them beneth the original layer. This improved certain tasks. My assumption here is, that it lengthen and strengthen kind of a thinking structure.
But because using LLMs are still so good and still return relevant improvements, i think a whole field of thinking in this regard is still quite unexplored.
If you ask a model to multiply 322423324 by 8675309232 without using tools, it's interesting to think about how it does it. Where are the intermediate results being maintained?
"In context" is the obvious answer... but if you view the chain of thought from a reasoning model, it may have little or nothing to do with arriving at the correct answer. It may even be complete nonsense. The model is working with tokens in context, but internally the transformer is maintaining some state with those tokens that seems to be independent of the superficial meanings of the tokens. That is profoundly weird, and to me, it makes it difficult to draw a line in the sand between what LLMs can do and what human brains can do.