It would be interestingi to know if AI is less likely to follow rules if the instructions provided to it contain foul or demeaning language. Too bad we couldn't re-play the scenario replacing NEVER F*ING GUESS! with:

**Never guess**

   - All behavioral claims must be derived from source, docs, tests, or direct command output.

   - If you cannot point to exact evidence, mark it as unknown.

   - If a signature, constant, env var, API, or behavior is not clearly established, say so.

Underrated comment here. https://www.anthropic.com/research/emotion-concepts-function This study convinced me to be "nice" to AI agents. At least as I understood it, there's something in the weights that activating the "desperate" vector makes it more likely to cheat or cut corners. So yes I would err towards your suggested prompt over NEVER FUCKING GUESS.