'Narrow scope produces better findings - Telling the model "Find vulnerabilities in this repository" makes it wander. Telling it "Look for command injection in this specific function, with this trust boundary above it, here's the architecture document and here's prior coverage of this area" makes it do something much closer to what a researcher would actually do.'

So what, we take every function and every vulnerability type and just run the agents millions of times?

I would expect Mythos to be able to find vulnerabilities without pointing it out for him, otherwise it's no better from other agents. It's just has a better harness.

> So what, we take every function and every vulnerability type and just run the agents millions of times?

Yes.

We build a skill where a coordinator AI enumerates all possible vulnerability types and all functions, then launches parallel max effort Mythos agents against all vulnerability x function pairs.

I've been doing something like this with Opus already. General code review. Enumerated dimensions like correctness, security, maintainability, etc. Asked the coordinator AI to explore the code and autodiscover subsystem boundaries. Then it runs an absurd amount of dimension x subsystem review agents.

It burns a lot of tokens and takes me like three days to complete a review session, but the results have been excellent so far. The resulting TODO list will keep me occupied for quite a while.

I can only imagine what these corporations with unlimited money are doing. Poor me can't afford API prices so I had to not only limit scope but also design a filesystem-like journaling mechanism for the agents in order to deal with the rate limit interruptions. I'm sure Cloudflare is not gonna have that problem.

This matches with what Nicholas Carlini from Anthropic said a the [un]prompted conference - https://www.youtube.com/watch?v=1sd26pWhfmg. Very worth watching.

I think the idea here is you give the Hunters (stage 2) a narrower scope, but have a parent agent responsible for dividing up the full search space (stage 1).

And note that Hunt tasks can be queued from previous Trace tasks, ie you find a vuln in one layer, so you queue a hunt for corresponding vulns in the layers that could exploit your first finding.

I'm still waiting something more specific or groundbreaking too. Feels like a lot of noise with just the goal to get people to talk about it. And now I realize I am talking about it and about nothing at the same time. Just fugazzi.

Yeah this whole post reads like Anthropic said “make sure you say how awesome Mythos is” but really what they’re saying is that it’s just a better harness.

Who is him?

Raised an eyebrow for me too. It’s interesting to see people subconsciously (?) assign a gender (him/her) to LLMs rather than using the appropriate “it.”