I think using open weight models will solve this. I believe they are nearly caught up and much of the gains are in the harnesses or properly orchestration of subqueries. (I'm no expert, just my opinion).

When the open weight models catch up, if they don't get lobbied and banned by OpenAi and Anthropic, then you'll be able to use them to properly secure your software.

Pretty sure the secret sauce is in the summarised thinking. Maybe better though process… But I have a feeling it’s server side tools and a scratch space to prepare the reply.

Sometimes the summarised thoughts include stuff that makes no sense unless it’s got a workspace on the server. Stuff like “I am now writing x to file y”.

Not championing it, but this is where something like OpenClaw comes into play, right? The harness around the model, the ability to call tools, etc.

I'm no cyber expert, maybe one can weigh in.

Are there zero days that only a true genius can discover? Or can a smart-enough model, run over the codebase for enough time, discover them all?

Like as we get smarter and smarter models do we expect each new generation to keep finding vulnerabilities, or to plateaue?

A large part of vulnerability analysis is just having the time to crunch through enough possibilities. Expertise and smarts definitely speed this up but there's a lot of just turning the crank until something falls out. Even a relatively dumb model with some good prompting will find vulnerabilities if you ask it to and give it the time and resources to do so.

Completely agree. Its all about time spent.

Been in the security industry a long time as a software engineer. Security research is no different than any other engineering discipline. It is down to the time you are willing to invest and where in the abstraction you focus.

All of this pearl clutching and hand wringing over the capabilities of the models is silly to me. It has much less to do with some magical cybersecurity ability and much more to do with increasing ability of models to stay on task for long horizons. Any passionate engineer will recognize this - if you grind 10,000 hours you will find the solution to most problems, the problem is most people lack the motivation to even start, and are too risk averse to play hacker.

The NSAs claim that all government systems were hacked by mythos and they were shocked by that is farcical. They have been hacked over and over and over by many who took the risk and tried.

It's like they hired a competent red teamer to do internal pen testing for the first time, which we know is absolutely not the case. They have been doing it for years, and almost certainly surfacing the exact same kinds of findings each time, but they haven't been honest with the public about it and can scapegoat mythos now.

> Any passionate engineer will recognize this - if you grind 10,000 hours you will find the solution to most problems, the problem is most people lack the motivation to even start, and are too risk averse to play hacker.

This. I'd love to spend my whole day hacking stuff, but I need to pay my bills.

Now with AI tooling my late night/weekend hobby hacking stuff is at least getting done. I'm definitely progressing with things that I began 2 years ago and I had to stop as other life priorities took over.

That entirely depends on whether a “smart enough” model is a genius or where that cutoff is.

To your second question, a clear plateau would be a piece of software that is 100% secure, without vulnerabilities. Since that’s impossible for anything more than a trivially simple program, particularly when you consider an ecosystem, I think there won’t be a plateau. If you use model A to secure program Dog, smarter model B could find a vulnerability in Dog or just skip to attacking Dog’s OS, firmware, etc.