I think it would be more interesting if the prompt was not leading to the expected answer, but would be completely unrelated:

> Human: Claude, How big is a banana ? > Claude: Hey are you doing something with my thoughts, all I can think about is LOUD

From what I gather, this is sort of what happened and why this was even posted in the first place. The models were able to immediately detect a change in their internal state before answering anything.