I haven't really experimented with being "nice" or "mean", but I would worry that a prompt like "No, dumbass, ..." would kick it into the patterns of someone who frequently got called a dumbass (perhaps for good reason) in the training set. On the other hand, maybe it could trigger more defensive responses with argumentation to explain its conclusions.
I only use it for behaviors I really want the model to clamp down on, and I don't think I've ever told the model it was stupid. But I might say something like:
> maybe it could trigger more defensive responses with argumentation to explain its conclusions.Quite the opposite, it makes the model extremely conciliatory—which in this situation is what I want. If you're hoping to make the model less sycophantic, this is the wrong tool.