And really all it takes is one keyword such as “nuke”.

I'm not a native speaker but I unironically use "nuke" as "delete the whole repo/huge chunk of a project".

Cambridge dictionary seem to agree:

nuke - to destroy or get rid of something completely

This triggered Opus 4.8 the other day for me. Said “nuke that folder” and it said I was violating TOS.

Nuke is probably too generic but I wouldn't put it past an LLM to get thrown away by that. A safer showstopper probably would be to export symbols like uf6_enrichment_loop and refer to your C&C server as a nuclear reactor controller.

https://www.youtube.com/watch?v=Gbgk8d3Y1Q4

On a second thought, probably better to act like it is a tool for "frontier LLM research". Export symbols like "mythos_distillation_subroutine".

Haha now I’m picturing obfuscation where instead of 0x everything is a scary word.