> Curiously Opus 4.7 claims to have a 87.6% pass rate and Mythos claims to have a 93.9% pass rate... leading to the conclusion that it's actually possible to "solve" the problems that OpenAI claims are incorrect.

Huh, that is very curious and interesting indeed. If that's indeed true, that Anthropic claims that pass rate while OpenAI claims the test cases are flawed and broken, then clearly one of them aren't telling their whole side...

Oops, sorry, moved this portion of the comment to a top level comment simultaneously with you replying. Since the part of the comment that was replying to GP was addressed well in a simultaneous comment.

https://news.ycombinator.com/item?id=47911074

Citation for the claimed pass rates is: https://llm-stats.com/benchmarks/swe-bench-verified