Isn't OpenAI's public flagship already beating Mythos on penetration testing? I get the impression Mythos is just valuation-juicing for IPO more than anything else.
The fact that they haven't released it yet suggests a cost/margins issue to me more than anything else. Short term, I'll probably keep using Antrhopic, but my long-term bet is that locally-served models win, if only because the quest for profitability will probably lead to intentionally-nerfed / enshittified frontier models.
At other vendors, ad placement within LLM responses is either coming or already here. Anthropic's handling of OpenClaw shows they're willing to engage in anti-competitive behavior, and the courts are not in a hurry to stop them. Why would I pay them $200 a month for such treatment when a $2K box does what I need locally?
Mythos is dramatically better specifically at finding zero-day vulnerabilities and developing exploits for them, that being what it was designed to do. On other cybersecurity tasks, GPT-5.5 is at least as good, but finding and exploiting zero-days is a particularly scary capability, which is why Mythos is a big deal. See, e.g., https://forum.effectivealtruism.org/posts/8yztpbjuPkyXsmA6n/....
AFAIK, Antropic claims that they weren't aiming for zero-days specifically. From https://red.anthropic.com/2026/mythos-preview/ :
I've been assuming that Mythos is just a big jump in model size, and that's where the jump in capabilities comes from. Hence I expect OpenAI not to be able to catch up without scaling up the model and hence significantly raising the API prices.Anthropic frames this as something emergent. Not 100% but in a way they always phrase it as like, it’s a great model, but our breaths were swept and taken with its approach to security.
What benchmarks are you referencing that show a comparison of the models for penetration testing?
Please link to the $2k box that gives Opus level performance!