Ignoring comments is not a solution because the texts can be put in random strings among the actual code.
And really all it takes is one keyword such as “nuke”.
I'm not a native speaker but I unironically use "nuke" as "delete the whole repo/huge chunk of a project".
Cambridge dictionary seem to agree:
nuke - to destroy or get rid of something completely
This triggered Opus 4.8 the other day for me. Said “nuke that folder” and it said I was violating TOS.
Nuke is probably too generic but I wouldn't put it past an LLM to get thrown away by that. A safer showstopper probably would be to export symbols like uf6_enrichment_loop and refer to your C&C server as a nuclear reactor controller.
https://www.youtube.com/watch?v=Gbgk8d3Y1Q4
On a second thought, probably better to act like it is a tool for "frontier LLM research". Export symbols like "mythos_distillation_subroutine".
Haha now I’m picturing obfuscation where instead of 0x everything is a scary word.
And really all it takes is one keyword such as “nuke”.
I'm not a native speaker but I unironically use "nuke" as "delete the whole repo/huge chunk of a project".
Cambridge dictionary seem to agree:
nuke - to destroy or get rid of something completely
This triggered Opus 4.8 the other day for me. Said “nuke that folder” and it said I was violating TOS.
Nuke is probably too generic but I wouldn't put it past an LLM to get thrown away by that. A safer showstopper probably would be to export symbols like uf6_enrichment_loop and refer to your C&C server as a nuclear reactor controller.
https://www.youtube.com/watch?v=Gbgk8d3Y1Q4
On a second thought, probably better to act like it is a tool for "frontier LLM research". Export symbols like "mythos_distillation_subroutine".
Haha now I’m picturing obfuscation where instead of 0x everything is a scary word.