You postulate it as if these two are mutually exclusive, but it's not at all clear why we can't be "completing the next token" to communicate in order to navigate social spaces. This last part is just where our "training" (as species) comes from, it doesn't really say anything about the mechanism.

Because what's motivating our language is a variety of needs, emotions and experiences as social animals. As such we have goals and desires. We're not sitting there waiting to be prompted for some output.

You constantly have input from all your senses, which is effectively your "prompt". If you stick a human into a sensory deprivation tank for long enough, very weird things happen.