They address that. The thing is that when they don’t fiddle with things, it (almost always) answers along the lines of “No, I don’t notice anything weird”, while when they do fiddle with things, it (substantially more often than when they don’t fiddle with it) answers along the lines of “Yes, I notice something weird. Specifically, I notice [description]”.
The key thing being that the yes/no comes before what it says it notices. If it weren’t for that, then yeah, the explanation you gave would cover it.
How about fiddling with the input prompt? I didn’t see that covered in the paper.