> Even the most basic questions such as put a ball in a cup and place it on a table upside down then pick up the cup and put it in a box.
I do not think this is a great example. First, it is not a question. Second, it seems very related to robotics. A model itself cannot put a ball anywhere, it can just call tools and answer in text, image, etc.
An LLM seeing "put a x in a y and place it on a z upside down then pick up the y and put it in a z2." and then a question about what happens could check a rag for properties of those x,y,z,z2 and still answer. Alternatively, this could be useful for coding, for example. And that is a very extreme example. Some basic language plus tool use could go quite far. I think it is a very interesting direction vs here is a gpu the price of a car.
I wasn't explicitly stating the question, It was paraphrasing a common test question for world knowledge.
That you don't need to have a ball, cup, table, or even the ability to perform physical actions in order to consider where the ball ends up is in-itself required knowledge.
The thing is we tried that for decades, using more formal logic to build reasoning engines. And we never got it to be even a fraction as good and generic as learning-based LLMs are today.
I dont think think my point is getting across. This is in the context of how much world knowledge a model needs to be trained on, not llm vs not llm.