I agree completely. I'm tempted to call it a clear falsification of any "reasoning" claim that some of these models have in their name.

But I think it's possible that there is an early cost optimisation step that prevents a short and seemingly simple question even getting passed through to the system's reasoning machinery.

However, I haven't read anything on current model architectures suggesting that their so called "reasoning" is anything other than more elaborate pattern matching. So these errors would still happen but perhaps not quite as egregiously.

If you ask a bunch of people the same question in a context where they aren't expecting a trick question, some of them will say walk. LLMs sometimes say walk, and sometimes say drive. Maybe LLMs fall for these kinds of tricks more often than humans; I haven't seen any study try to measure this. But saying it's proof they can't reason is a double standard.