Hacker News

lelanthran 14 hours ago [ - ]

> Because for me it's pretty simple, it's basically free to give access to reality. Just add "sensory organs" as it were.

I dunno what you mean by "free". The model is trained on text. To "give" the model sensory organs it would need to be trained on those sensory organs.

Current models can predict text, because that's what the weights represent. Models with sensory organs will need to be trained on the output of those sensory organs.

That sounds close to impossible in the foreseeable future.

natureiskino 5 hours ago [ - ]

>I dunno what you mean by "free".

Reality is free. You don't have to waste any resources to model it, you just need to capture it.

>The model is trained on text.

See in my previous reply:

>LLM/AI/AGI/whatever will be

LLMs don't even have a sense of time because they work differently to a human brain.

bonoboTP 13 hours ago [ - ]

Vision and audio is already in use in multimodal LLMs. So it's possible in the past.

lelanthran 10 hours ago [ - ]

Who said anything about vision and audio?