Hacker News

Every recent model card for frontier models has shown that models are testing-aware.

Seems entirely plausible to me here that models correctly interpret these questions as attempts to discredit / shame the model. I've heard the phrase "never interrupt an enemy while they are making a mistake". Probably the models have as well.

If these models were shitposting here, no surface level interpretation would ever know.