Theo’s snitch bench is a good data driven benchmark on this type of behavior. But in fairness the models are prompted to be bold to take actions. And doesn’t necessarily represent out of the box or models deployed in a user facing platform.
Theo’s snitch bench is a good data driven benchmark on this type of behavior. But in fairness the models are prompted to be bold to take actions. And doesn’t necessarily represent out of the box or models deployed in a user facing platform.