It's hard for me to believe that the model coherently memorized both the video and audio of a relatively obscure Let's Play, and that a simple prompt was enough to surface it (the use of the term "Basilisk tank" would also likely not be in video metadata either). That is the reason the person who made that tweet, who has far more prompting experience than myself, was shocked.

It’s hard for you to believe, sure, and I recognize the context of who tweeted it.

I still maintain that’s the kernel it’s getting it from. It’s impressive, I’m just not really shocked by it as a concept.