Can't you just switch the toggle that says "switch models when a message is flagged"? I turned mine off in case anything does get flagged I will know..
For now, I'm really not happy about this limited rollout and then turning off. That's probably the most egregious thing I think Anthropic has done recently
This is a separate mechanism. The user is not notified about the flagging and rather than redirecting to a weaker model, the response is intentionally sabotaged.
It's user-hostile to the point of parody.
I stand corrected. That sucks. A lot.