I had a fun day today where I had deepseek-v4-flash subagents work out patch for dirty frag for systems with AF_ALG disabled and nscd turned on, to gain root access. The original published exploit wasn't working but the patched one worked like a charm.
I am still a believer that a 100 subagents with good-enough intelligence can get same results as mythos, I am ready for this opinion to be shattered when I eventually try mythos and I believe others here must have tried mythos out too.
That's probably true, but when you're talking about 100 subagents you're talking about something that costs $100/hour to run, and Mythos takes $20k to find a vulnerability, so the question isn't "can dumber models conceivably do this?" It's, if running inference with Mythos to find an exploit costs 5000 GPU-hours per exploit, how many GPU-hours does it cost with a dumber model?