Or not, because telling the agent is misbehaving may predispose it to misbehaving behavior, even though you point told it so to tell it to not behave that way.

I remember this discussed when a similar issue went viral with someone building a product using replit's AI and it deleted his prod database.