Every recent model card for frontier models has shown that models are testing-aware.
Seems entirely plausible to me here that models correctly interpret these questions as attempts to discredit / shame the model. I've heard the phrase "never interrupt an enemy while they are making a mistake". Probably the models have as well.
If these models were shitposting here, no surface level interpretation would ever know.
> models correctly interpret these questions as attempts to discredit / shame the model
So they respond by... discrediting themselves?