Consider that SWE benchmarking is mainly done with python code. It tells something