I just ran this with Gemini 3 Pro, Opus 4.6, and Grok 4 (the models I personally find the smartest for my work). All three answered correctly.
I just ran this with Gemini 3 Pro, Opus 4.6, and Grok 4 (the models I personally find the smartest for my work). All three answered correctly.
They had plenty of time to update their system prompts so they don't be embarrassed.
I noticed whenever such meme comes out, if you check immediately you can reproduce it yourself, but after a free hours it's already updated.
I think you're seriously underestimating how much effort the fine tuning at their scale takes and what impact it has. They don't pack every edge case into the system prompt either. It's not like they update the model every few hours or even care about memes. If they seriously did, they'd force-delegate spelling questions to tool calls.
Could it be the model is constantly searching its own name for memes, or checking common places like HN and updating accordingly? I have no idea how real-time these things are, just asking.
The model doesn't do anything on its own. And it's usually months in between new model snapshots.
I tested it on Claude and only Opus 4.6 answers it correctly. Haiku and Sonnet can't and Opus 4.5's reply is unintelligible. The would've updated the system prompts for all models.
The road to AGI is weirder than anticipated
thats not how it works