What he's saying is that you should read the "Caveats and limitations" section of the article.
Here's the first one:
> Our tests gave models the vulnerable function directly, often with contextual hints (e.g., "consider wraparound behavior").
Mythos did no such thing, it was cut lose and told to find vulnerabilities. If the intent was to prove that small models are just as good, they haven't demonstrated that at all. The end.