It's been decades (?) since I read the paper, but I think the questioner is key for multiple reasons, especially if you consider a generalized iterated version of the Turing test as it develops into the future.

I think the general idea is about being able to detect a difference between a machine and human, not whether the human alone can guess, as you're pointing to. In a general case, you can think of the questioner as some kind of detector, a classification system, an algorithm or method.

Let's say the classification system, the questioner, is able to be improved, and in this sense, there develops a kind of adversarial or challenge relationship between the AI developer and the questioner developer. Both improve, such that the AI becomes more humanlike, the questioner is improved and then can tell the difference again, and so forth and so on. Whether or not the AI "passes" the test isn't a static outcome; it likely passes, then fails, then passes again and so forth as the AI and questioner improve.

What's key is that you could argue that what happens is the AI becomes more humanlike, but at the same time the questioner also develops a more detailed model or representation of what it means to be humanlike. In this case, you could argue that the questioner must develop some descriptive representation of "human-likeness" that's just as sophisticated as the AI instantiates it, and what likely would occur is that the AI would become more humanlike in response to the improved respresentations and classification of the questioner. The questioner in some sense is a kind of mirror image instantiation of humanness as that represented by the AI, and vice versa.

It's the questioner in this iterated Turing test that ensures the AI becomes more humanlike, maybe to an extent the humans themselves aren't able to understand or recognize during the test. The AI wouldn't necessarily be imitating the human, it would be imitating what the questioner thinks is human.