> Re 1: You experience the world in real time (or close enough) via your senses, which combine to form a spatiotemporal sense: A sense of being a bounded entity in space and time. The LLM has none of that. They experience the world via stale old text and text derivatives.

It's not clear to me that this is a fundamental limitation. If you provide LLMs with a news feed, it's closer to real-time. You can incrementally get closer than that in very obvious ways.

> Re 2: There's something tremendous in the fact, staring us right in the face, that LLMs are unable to meaningfully contribute to academic/medical research. I'm not saying that they need to perform on the level of a one-in-a-million Maxwell, DaVinci, or whatever. But as Dwarkesh asked one year ago: "What do you make of the fact that these things have basically the entire corpus of human knowledge memorized and they haven't been able to make a single new connection that has led to a discovery?"

LLMs have been around for a very short time. It wouldn't surprise me if researchers have used them to make discoveries. If they haven't, they will soon. Then there's a question about attribution...if you're a researcher and you use an LLM to discover something, do you give it credit? Or is it just a tool? There's a long, long history of researchers being less than honest how they made some discovery.

> Re 3: Sure, you can hold it by the hand and spoonfeed it. You can also create for it a mirror reality which doesn't exist, which is pure fiction. Given how limited these systems are, I don't suppose it makes much of a difference. There's no way for it to tell. The "human in the loop" is its interaction with the world. And a pale, meager interaction it is.

Our perception of reality is meager too. You can easily imagine how an LLM could be "plugged in" to reality. Again nothing fundamental here.

> Re 4: Static, old images/video that they were trained on some months ago. That, too, is no way of interacting with the world.

No, you can send an LLM a video/image and it can "understand it". It's not perfect but, like I said, the technology is already here to project video data into something the LLMs can interact with.