It's notable that Anthropic are still using SWEBench as a coding benchmark rather than the newer more difficult DeepSWE which shows them well behind GPT 5.5
Bear in mind that all the marketing efforts such as solving Erdos problem are the result of concerted RL training to impart those narrow capabilities, and how much of any benchmark results, or "early access" paid shill vibe reports, reflect improved performance for more general real-world use cases remains to be seen.
Well I have just tested it and GPT 5.5 is still smarter. It catches bugs that Fable doesn’t. Anthropic Fable is basically still sloppy like Opus 4.x. And I got also the downgrade for “cyber violations” trying to build a custom Debian ISO…that tells me their safeguards are sh**. I didn’t ask it to hack anything. Just to make a script that builds a custom Debian distribution with various settings…so this Fable thing seems like a flop&slop already. That warning plus the privacy change is the wake up call to move from Anthropic
That remains to be seen.
It's notable that Anthropic are still using SWEBench as a coding benchmark rather than the newer more difficult DeepSWE which shows them well behind GPT 5.5
https://deepswe.datacurve.ai/
Bear in mind that all the marketing efforts such as solving Erdos problem are the result of concerted RL training to impart those narrow capabilities, and how much of any benchmark results, or "early access" paid shill vibe reports, reflect improved performance for more general real-world use cases remains to be seen.
For how long though? The past two months have seen a ridiculous number of model releases.
Well I have just tested it and GPT 5.5 is still smarter. It catches bugs that Fable doesn’t. Anthropic Fable is basically still sloppy like Opus 4.x. And I got also the downgrade for “cyber violations” trying to build a custom Debian ISO…that tells me their safeguards are sh**. I didn’t ask it to hack anything. Just to make a script that builds a custom Debian distribution with various settings…so this Fable thing seems like a flop&slop already. That warning plus the privacy change is the wake up call to move from Anthropic
Why don't you think that? What I've read is that other models can find the same bugs.