Yeah, I'm not super happy with the chart sorting order, but trying to balance all the information is challenging. I chose not to include partials (right place, inaccurate bug description, so it smelled something funny but didn't quite understand it) in the sort order, but maybe should.

And, it does feel wrong that the unrealistically expensive model that no one in their right mind would use for anything but the most critical tasks (and even then, a committee of ten of the best alternatives would cost half as much) is at the top. But, GPT 5.5 Pro did find a bug nobody else found among the four cases it got to, hinting at some real difference. It may be closer to Mythos than others, but at an absurd price. It'd cost tens of thousands of dollars to audit all the files in a large codebase, versus maybe fifty bucks for MiMo or DeepSeek.