> Humans are a) alot more capable than language models
That's a scale of capability, not architecture difference. A human kid is less capable than an adult, but you wouldn't classify them as thinking using different mechanisms.
> b) humans do not rely solely on the statistical relationships of language tokens. (...) Language tokens require organized language.
That's just how you provide data. Multimodal models can accept whole vectors describing images, sounds, smells, or whatever else - all of them can be processed and none of them are organised language.
> that our brains are somehow more closely related to silicon-based integrated circuits than they are to those of a Gorilla
That's entirely different from a question about functional equivalence and limit of capabilities.