Hacker News

Out of curiosity, did you give a test for them to validate the code?

I had a test failing because I introduced a silly comparison bug (> instead of <), and claude 4.6 opus figured out it wasn't the test the problem, but the code and fixed the bug (which I had missed).

lm28469 2 months ago [ - ]

There was a test and a very useful golang error that literally explain what was wrong. The model tried implementing a solution, failed and when I pointed out the error most of them just rolled back the "solution"

frde_me 2 months ago [ - ]

What exact models were you using? And with what settings? 4.6 / 5.3 codex both with thinking / high modes?

lm28469 2 months ago [ - ]

minimax 2.5, kimi k2.5, codex 5.2, gemini 3 flash and pro, glm 4.7, devstral2 123b, etc.

Izikiel43 2 months ago [ - ]

Ok, thanks for the info

2 months ago [ - ]

[deleted]