I think you could probably train a model to consider boolean logic, modal logic, and mathematics reasonably well, but there is still a pretty big leap between that and thinking about things.
Even the most basic questions such as put a ball in a cup and place it on a table upside down then pick up the cup and put it in a box.
Requires knowledge of things not mentioned in the question (notably gravity).
Strict definition of all terms quickly gets you into a quagmire of complexity. Some base level of knowledge about things is required for you to give it instructions. If it only knows how to reason, it lacks any idea of what to aim to achieve.
There is quite a pronounced disconnect between the vast stores of written data that models are trained on and robust consideration of a topic. I do wonder if the path can be directed by the order of training.
For example if you train a model to basic literacy using tinystories, then math and philosopy texts, then psychology, and sociology texts, and then finally the mass data of everything from conversations and rants, to code and fiction.
Does that end up with a significantly different model to one that is trained on books on acting, creative writing, and fantasy novels, before introducing the same final mass data set.
How much does it's current ability allow it to contextualise new training data?
>Even the most basic questions such as put a ball in a cup and place it on a table upside down then pick up the cup and put it in a box.
That reminds me - this used to be my go-to question for smaller models and on which they would always fail miserably on:
A small strawberry is placed in a large cup. The cup is placed upside down on the kitchen table. Someone then lifts the cup as-is and puts it in the microwave. Where is the strawberry when the cup is in the microwave?
Here's what the 1.9GB VibeThinker-3B-GGUF:Q4_K_M answered:
Answer: The strawberry is still on the kitchen table – it fell out when the cup was turned upside‑down, and the subsequent lift‑and‑microwave move doesn’t change that.
So it seems there is definite progress here. Both specialized and yet improved common sense on things outside its domain of specialization.
Is that learned common sense or has it learned the structure of that particular problem?
What happens if you ask
A small strawberry is placed in a large cup. The cup is placed upside down on a saucer on the kitchen table. Someone then lifts the cup and saucer as-is and puts them in the microwave. Where is the strawberry when the cup is in the microwave?
The hard part was always the number of 'r's
> Even the most basic questions such as put a ball in a cup and place it on a table upside down then pick up the cup and put it in a box.
I do not think this is a great example. First, it is not a question. Second, it seems very related to robotics. A model itself cannot put a ball anywhere, it can just call tools and answer in text, image, etc.
An LLM seeing "put a x in a y and place it on a z upside down then pick up the y and put it in a z2." and then a question about what happens could check a rag for properties of those x,y,z,z2 and still answer. Alternatively, this could be useful for coding, for example. And that is a very extreme example. Some basic language plus tool use could go quite far. I think it is a very interesting direction vs here is a gpu the price of a car.
I wasn't explicitly stating the question, It was paraphrasing a common test question for world knowledge.
That you don't need to have a ball, cup, table, or even the ability to perform physical actions in order to consider where the ball ends up is in-itself required knowledge.
The thing is we tried that for decades, using more formal logic to build reasoning engines. And we never got it to be even a fraction as good and generic as learning-based LLMs are today.
I dont think think my point is getting across. This is in the context of how much world knowledge a model needs to be trained on, not llm vs not llm.