I'm fascinated that Anthropic employees, who are supposed to be the LLM experts, are using tricks like these which go against how LLMs seem to work.

Key example for me was the "malware" tool call section that included a snippet with intent "if it's malware, refuse to edit the file". Yet because it appears dozens of times in a convo, eventually the LLM gets confused and will refuse to edit a file that is not malware.

I've resorted to using tweakcc to patch many of these well-intentioned sections and re-work them to avoid LLM pitfalls.

They aren’t necessarily experts at using Llm’s. They have different incentives as well

These aren't as much tricks as just one layer of defense. But prompting is useless, as you can use the API directly without these prompts.

I run claude code with my own system prompt and toolings on top of it. tweakcc broke too often and had too many glitches.

Was that an Anthropic issue, or a gpt-oss problem?