The major problem with most telling of the test is we don’t do it. The game is to be played with three participants: two competitors, and a questioner. Of course today the assumption is it’ll be a human and a machine, no questioner. The goal was not for the machine to trick a human but for the machine to appear more human to a questioner than a human being questioned at the same time.

Does any of that matter? I have no idea. I suspect Turing would say no as flippantly as he predicted in the paper that “The original question, ‘Can machines think?’ I believe to be too meaningless to deserve discussion. Nevertheless I believe that at the end of the century the use of words and general educated opinion will have altered so much that one will be able to speak of machines thinking without expecting to be contradicted.”

I’d strongly recommend anyone interested in having genuine discussions about LLMs read the paper. It’s genuinely a quick and easy read that’s still relevant. It reads as though it could have been a blog post linked here yesterday.

It's been decades (?) since I read the paper, but I think the questioner is key for multiple reasons, especially if you consider a generalized iterated version of the Turing test as it develops into the future.

I think the general idea is about being able to detect a difference between a machine and human, not whether the human alone can guess, as you're pointing to. In a general case, you can think of the questioner as some kind of detector, a classification system, an algorithm or method.

Let's say the classification system, the questioner, is able to be improved, and in this sense, there develops a kind of adversarial or challenge relationship between the AI developer and the questioner developer. Both improve, such that the AI becomes more humanlike, the questioner is improved and then can tell the difference again, and so forth and so on. Whether or not the AI "passes" the test isn't a static outcome; it likely passes, then fails, then passes again and so forth as the AI and questioner improve.

What's key is that you could argue that what happens is the AI becomes more humanlike, but at the same time the questioner also develops a more detailed model or representation of what it means to be humanlike. In this case, you could argue that the questioner must develop some descriptive representation of "human-likeness" that's just as sophisticated as the AI instantiates it, and what likely would occur is that the AI would become more humanlike in response to the improved respresentations and classification of the questioner. The questioner in some sense is a kind of mirror image instantiation of humanness as that represented by the AI, and vice versa.

It's the questioner in this iterated Turing test that ensures the AI becomes more humanlike, maybe to an extent the humans themselves aren't able to understand or recognize during the test. The AI wouldn't necessarily be imitating the human, it would be imitating what the questioner thinks is human.