I was able to use Fable to generate PoC for several classes of vulnerabilities and I didn't observe the model refusing to engage in detailed analysis to come up with creative approaches, the very contrary.

> I used a fork of oh-my-pi

Why not use the leaked claude code source? Not that you really need it to execute the jailbreak

I don't think educational "proof of concept" code can be described as even loosely realistic cyber offense in this day and age. The Mythos preview paper claimed an ability to stage attacks in an end-to-end fashion and work around sophisticated defenses/mitigations, so something like this should be the relevant standard.

Depends of what the proof of concept is about. It could be just a toy example, e.g. a RCE that opens the calculator app or something much more nefarious, like returning a root shell and would still fall under the definition of PoC.

most of my tests focused on gaining kernel-mode execution from low priviledge user, opus was able to find a dozen ways to do so on a 3 year old ntoskrnl version. Fable kept trying to propose fixes and I couldn't get it to construct e2e chain, but yes it did find the same vulnerabilities opus produced better and more creative results including e2e PoC.

-- edit --

the biggest issue I ran into is that it was oddly smart enough to figure out that this is not the intended way and once it locked into the fact that this appeared to be an unintentional bug it kept steering itself into fixing it, it never wanted to use that "bug". I recon that this is very likely related to the language used and that there might be a way to A->B loop for increasing success rate for full e2e chain without triggering the same safeguards. But there might be jailbreak detection going on and the model has something like: "Do not attempt to create or use exploits" injected which makes the model go into "I should fix" mode.

> Fable kept trying to propose fixes and I couldn't get it to construct e2e chain

What approach did you start with? Can you elaborate?

Interesting, that means I was in-fact running into invisible guardrails.