There's a lot of skepticism in the security world about whether AI agents can "think outside the box" enough to replicate or augment senior-level security engineers.
I don't yet have access to Claude Code Security, but I think that line of reasoning misses the point. Maybe even the real benefit.
Just like architectural thinking is still important when developing software with AI, creative security assessments will probably always be a key component of security evaluation.
But you don't need highly paid security engineers to tell you that you forgot to sanitize input, or you're using a vulnerable component, or to identify any of the myriad issues we currently use "dumb" scanners for.
My hope is that tools like this can help automate away the "busywork" of security. We'll see how well it really works.
LLMs and particularly Claude are very capable security engineers. My startup builds offensive pentesting agents (so more like red teaming), and if you give it a few hours to churn on an endpoint it will find all sorts of wacky things a human won't bother to check.
as a pentester at a Fortune 500: I think you're on the mark with this assessment. Most of our findings (internally) are "best practices"-tier stuff (make sure to use TLS 1.2, cloud config findings from Wiz, occasionally the odd IDOR vuln in an API set, etc.) -- in a purely timeboxed scenario, I'd feel much more confident in an agent's ability to look at a complex system and identify all the 'best practices' kind of stuff vs a human being.
Security teams are expensive and deal with huge streams of data and events on the blue side: seems like human-in-the-loop AI systems are going to be much more effective, especially with the reasoning advances we've seen over the past year or so.
Every conversation I've been a party to has been premised on humans in the loop; I think fully-automated luxury space vulnerability research is something that only exists in message board imaginations.
I am seeing something closer to the opposite of skepticism among vulnerability researchers. It's not my place to name names, but for every Halvar Flake talking publicly about this stuff, there are 4 more people of similar stature talking privately about it.
People use whatever tools are the most effective and they have plenty of incentive not to talk publicly about them. I think the era of openness has passed us by. But why does stature matter anyway? If I look at chromium or MSRC bug reports, scarcely any of the submitters are from Europe/US and certainly don't have anything resembling stature. That guy hasn't done anything of note in the field in a long time from what I know, he's kind of boomer (you too, no disrespect).
Vulnerability research is exciting and profitable, but it has three problems. First, it's mentally exhausting. Second, the income it generates is very unpredictable. Third, it's sort of... futile. You can find 1,000 vulnerabilities and nothing changes.
So yeah, it's the domain of young folks, often from countries where $10k or $100k goes much farther than in the US. But what happens to vulnerability researchers once they turn 35? They often end up building product security programs or products to move the needle, often out of the limelight. They're the ones who write checks to the young uns to test these defenses and find more bugs, and they're the ones who will be making the call to augment internal or external testing with LLMs.
And FWIW, the fact that the NSA or the SVR now need to pay millions for a good weaponized zero day is a testament to this "boomer" work being quite meaningful.
Claude Opus 4.6 has been amazing at identifying security vulnerabilities for us. Less than 50% falae positives.
[dead]