> Because for me it's pretty simple, it's basically free to give access to reality. Just add "sensory organs" as it were.
I dunno what you mean by "free". The model is trained on text. To "give" the model sensory organs it would need to be trained on those sensory organs.
Current models can predict text, because that's what the weights represent. Models with sensory organs will need to be trained on the output of those sensory organs.
That sounds close to impossible in the foreseeable future.
>I dunno what you mean by "free".
Reality is free. You don't have to waste any resources to model it, you just need to capture it.
>The model is trained on text.
See in my previous reply:
>LLM/AI/AGI/whatever will be
LLMs don't even have a sense of time because they work differently to a human brain.
Vision and audio is already in use in multimodal LLMs. So it's possible in the past.
Who said anything about vision and audio?