Hacker News

This benchmark draws a very different picture having GPT5.5 on the very top with 70% and DeepSeek at 8%

https://deepswe.datacurve.ai

DeepSWE has been heavily criticized though. https://github.com/datacurve-ai/deep-swe/issues/21 Putting GPT 5.5 on top is the obviously correct part, but everything else about it makes very little sense.