These models demonstrably have good vulnerability research capabilities.
I'm sure their marketing department is ecstatic but you guys are far more hype-based than what you're calling out.
These models demonstrably have good vulnerability research capabilities.
I'm sure their marketing department is ecstatic but you guys are far more hype-based than what you're calling out.
Good but not necessarily better that was is already pay-as-you-go available today. ref. https://www.flyingpenguin.com/the-boy-that-cried-mythos-veri...
This AISLE benchmark is interesting in this matter: https://aisle.com/blog/ai-cybersecurity-after-mythos-the-jag...
And the recently discovered Copy Fail by Xint code is another proof that the gating is overblown: https://xint.io/blog/copy-fail-linux-distributions
> demonstrably
I'm not entirely up to date on each week's LLM hype train/scandal but last I heard there was no public access to it or public-trusted 3rd parties that can review model's capabilities
You are up to date. Mythos had unauthorized access because of poor security but that's it as far as I know. Not exactly a good sign for something being advertised as a weapon...
You'd think if Mythos was so good at finding security issues they could point it at their own setup for it and have found those issues easily...
It’s easy to end up with no public-trusted third parties if we arbitrarily distrust third parties who say the capabilities match what’s promised. Mozilla for example says it found hundreds of Firefox vulnerabilities, and I think it’s pretty unlikely they’re lying to cover Anthropic’s back.
I think the question around the Firefox find, is not that they found hundreds of vulnerabilities - they found hundreds of bugs.
What would be really interesting is a side by side Claude Opus 4.7 and Mythos comparison.