> So what, we take every function and every vulnerability type and just run the agents millions of times?
Yes.
We build a skill where a coordinator AI enumerates all possible vulnerability types and all functions, then launches parallel max effort Mythos agents against all vulnerability x function pairs.
I've been doing something like this with Opus already. General code review. Enumerated dimensions like correctness, security, maintainability, etc. Asked the coordinator AI to explore the code and autodiscover subsystem boundaries. Then it runs an absurd amount of dimension x subsystem review agents.
It burns a lot of tokens and takes me like three days to complete a review session, but the results have been excellent so far. The resulting TODO list will keep me occupied for quite a while.
I can only imagine what these corporations with unlimited money are doing. Poor me can't afford API prices so I had to not only limit scope but also design a filesystem-like journaling mechanism for the agents in order to deal with the rate limit interruptions. I'm sure Cloudflare is not gonna have that problem.