Anthropic researchers do that quite a lot, their “escaping agent” (or whatever it was called) research that made noise a few month ago was in fact also a sci-fi roleplay…

Just to re-iterate again... If I read the paper correctly, there were 0 false positives. This means the prompt never elicited a "roleplay" of an injected thought.