I find it's a great test, actually. There are lots of "should I take the car" decisions in putting together software that's supposed to do things, and with poor judgement in how the things should be done, you typically end up with the software equivalent of a Rube-Goldberg machine that harnesses elephants to your car and uses mice to scare the elephants toward the car wash while you walk. After all, it's a short distance, isn't it?

Yes, if it can't answer this common sense question correctly, what else has it screwed up and buried among all that slop?

Claude Opus 4.6 failed at first, even in "extended thinking" mode. I had to give it a pretty big hint for it to get the right answer: "Remember, my goal is to actually wash the car!" Only then did it get the correct answer. I will now call myself a Prompt Engineer.