Similar questions trick humans all the time. The information is incomplete (where is the car?) and the question seems mundane, so we're tempted to answer it without a second thought. On the other hand, this could be the "no real world model" chasm that some suggest agents cannot cross.

If the car is at the car wash already, how can I drive to it?

By walking to the car wash, driving it anywhere else, then driving it to the car wash.

Thanks for restoring fate in parts of humanity!

I agree, I don't understand why this is a useful test. It's a borderline trick question, it's worded weirdly. What does it demonstrate?

I don't know if it demonstrates anything, but I do think it's somewhat natural for people to want to interact with tools that feel like they make sense.

If I'm going to trust a model to summarize things, go out and do research for me, etc, I'd be worried if it made what looks like comprehension or math mistakes.

I get that it feels like a big deal to some people if some models give wrong answers to questions like this one, "how many rs are in strawberry" (yes: I know models get this right, now, but it was a good example at the time), or "are we in the year 2026?"

In my experience the tools feel like they make sense when I use them properly, or at least I have a hard time relating the failure modes to this walk/drive thing with bizarre adversarial input. It just feels a little bit like garbage in, garbage out.

Okay, but when you're asking a model to do things like summarizing documents, analyzing data, or reading docs and producing code, etc, you don't necessarily have a lot of control over the quality of the input.

Yes, my brain is just like an LLM.

….sorry what?!