Anyone care to share which models and which prompts actually lead to finding these kinds of vulnerabilities? Or the narrowing-down workflow that can get an LLM to discover them? Surely just telling claude "Find all vulnerabilities in this project LOL" isn't enough? I hope?

The Anthropic researchers have said their flow is as simple as:

1. Pick a file to seed as a starting place.

2. Ask the LLM (in an agent harness) to find a vulnerability by starting there.

3. If it claims to have found something, ask another one to create an exploit/verify it/prove it or whatever.

4. If both conclude there is a vuln, then with the latest models you almost certainly found something real.

Just run it against every file in a repo, or select a subset, or have an LLM select files with a simple "what X files look likely to have vulns?".

So basically yes, it is that simple. It's just a matter of having the money to pay for the tokens.

Thanks for the reply. Pretty remarkable.

[dead]