>Even the most basic questions such as put a ball in a cup and place it on a table upside down then pick up the cup and put it in a box.
That reminds me - this used to be my go-to question for smaller models and on which they would always fail miserably on:
A small strawberry is placed in a large cup. The cup is placed upside down on the kitchen table. Someone then lifts the cup as-is and puts it in the microwave. Where is the strawberry when the cup is in the microwave?
Here's what the 1.9GB VibeThinker-3B-GGUF:Q4_K_M answered:
Answer: The strawberry is still on the kitchen table – it fell out when the cup was turned upside‑down, and the subsequent lift‑and‑microwave move doesn’t change that.
So it seems there is definite progress here. Both specialized and yet improved common sense on things outside its domain of specialization.
Is that learned common sense or has it learned the structure of that particular problem?
What happens if you ask
A small strawberry is placed in a large cup. The cup is placed upside down on a saucer on the kitchen table. Someone then lifts the cup and saucer as-is and puts them in the microwave. Where is the strawberry when the cup is in the microwave?
The hard part was always the number of 'r's