I often wonder if human intelligence is essentially just predicting words and phrases in a cohesive manner. Once the context size becomes large enough to encompass all a person history, predicting becomes indistinguishable from thinking.

Maybe, but I don't think this is strictly how human intelligence works

I think a key difference is that humans are capable of being inputs into our own system

You could argue that any time humans do this, it is as a consequence of all of their past experiences and such. It is likely impossible to say for sure. The question of determinism vs non-determinism has been discussed for literal centuries I believe

But if AI gets to a level where it could be an input to its own system, and reaches a level where it has systems analogous to humans (long term memory, decision trees updated by new experiences and knowledge, etc.) then does it matter in any meaningful way if it is “the same” or just an imitation of human brains? It feels like it only matters now because AIs are imitating small parts of what human brains do but fall very short. If they could equal or exceed human minds, then the question is purely academic.

That's a lot of really big ifs that we are likely still a long way away from answering

From what I understand there is not really any realistic expectation that LLM based AI will ever reach this complexity

The body also has memory and instinct. It's non-hierarchical, although we like to think that the mind dominates or governs the body. It's not that it's more or less than predicting, it's a different activity. Humans also think with all their senses. It'd be more or less like having a modal-less or all-modal LLM. Not sure this is even possible with the current way we model these networks.

And not just words. There is pretty compelling evidence that our sensory perception is itself prediction, that the purpose of our sensory organs is not to deliver us 1:1 qualia representing the world, but more like error correction, updates on our predictions.