The methodoly used is quite naive.
I've used glm 5.1 on fairly advanced crackme challenges (example: https://crackmes.one/crackme/698f40f1e2ba6023bfacaa82), and to my suprise it was able to patch binaries, doing runtime analysis, bypassing anti debug techniques, etc.
Expecting the model to do everything by itself is unrealistic, I found that working along the modal works really well. I'm not speaking about spoiling the solution, just tell it which direction to explore. Chinese models are much more capable than people give it credit for, but Claude/Codex won the marketing game.
The only usecase of this methodology would be for CI integration, which can be nice but I think security reviews still need human attention and expertise.
> Expecting the model to do everything by itself is unrealistic
Well that’s the pitch.
Is it? Aren't most edge LLM capabilities determined by specialized harnesses?
Thank you for your note! As I mention in the post this is not scientific at all.
I'm very curious how you would do multiple runs of multiple models in a "work alongside the model" manner?
Discovering vulnerabilities is a highly creative task, it's when you explore unsual paths that you discover atttack angles. Some bugs are simple, other are a complex orchestration of many factors.
By "Working with the model", is essentially reading the ouput of prompts and pointing in a direction just to decide the next steps. You could try to increase the prompt limit and create an agent that explores multiples directions in a DFS manner.
The issue with vulnerabilities is the agent not knowing when to stop because it's hard to validade if you reach the final result or not. I get amazing result when I code with AI, letting the AI go wild is just a waste a time and tokens.
I recommend you to read the write up on the crackme (https://crackmes.one/crackme/698f40f1e2ba6023bfacaa82), I think most experience developers would need, at least, 2 months of learning reverse engineering techiques to hopefully crack this one. GLM 5.1 manage to solve it, it didn't "copy pasted" any answer from it's training data. It did a binary analysis, anti debug patching, patching binaries, debugging memory during runtime etc. It only took about 20 minutes.
After seeing what GLM did, I do believe Anthropic concerns about Mythos are real. Cracking software just became a lot easier, too easy for my taste. Video games cheats will be the norm, cracked desktop apps without licenses and infected with malware. It's not a new thing but it just became too easy.
Thank you so much for this detailed answer!! Excited to dig into this world more :)
Maybe have a second model that is configured to nudge the first model in the direction of exploration, and have the two of them work in tandem?
>>I've used glm 5.1 on fairly advanced crackme challenges
which have most likely been trained on, so all you did was regurgitate someone elses solution
Anthropic made their models very averse to reverse engineering and vulnerability research chores. It is a difficult problem, but attackers will use models like GLM and defenders will be stuck with security engineering averse models.
Claude used to be good with CTFs, but they added tons of guard rails lately and now it just says "Sorry, I can't help with anything to do with that"
You have to do what I call "Manhattan Project" them. You can almost always evade the controls by carefully prompting them. It just wastes effort and time you should be spending doing other things in an LLM workflow. Essentially, there is almost no single discrete piece of a reverse engineering or CTF process that you can't get Claude to do, you just have to isolate it adequately and avoid letting it use names that attenuate it towards "this is an exploit" or "this is reverse engineering". I have not found a task I could not convince Claude to do. You can also fill the context window up with badgering it and eventually it is likely to simply let you through if you are careful, most of the safe guards are not deterministic.
Sorry, Dave. I can't do that.
[dead]