Hacker News

I personally dont put any weight to DeepSWE. Other than 5.5 being directionally the best model, it gets the others pretty wrong in my experience. FrontierCode from cognition looks interesting