Hacker News

On refusals: I found that many models are fine with security work if they think what they're working on is local. They do get very pushy if they think it's a live target.

GPT-5.5 xhigh refused to perform RE on a live JS VM. I had it extract the VM from the target, which it was happy to do, then in a clean session, had it working on this offline artifact - which it was again, happy to work on.

Then I found even simpler trick: I proxied the target from localhost and it was happy to perform anything on the target.

Opus is a different story. Claude does so many mid-turn prompt injections and classifiers, that probably 30% of its context is consisting of "refuse to do work" lines. It refuses to even scrape a page.