Hacker News

> DS stopped every 30 minutes or so, saying it did full RE and it should all work now, while in fact, it didn't complete even 1% of it. It also looked for shortcuts again and again, despite me prompting heavily that the specific shortcut may not be used. It was a complete and utter failure.

This is my experience with non-SOTA models across the board. When you try them on little tasks and they work it feels amazing, but then you go deeper and you're back to going in loops and fighting the model for hours.

Switching back to a SOTA model immediately yields progress again.

When I read all of the comments from people saying they can't tell a difference between Opus and <insert open weight model here> I don't know if they haven't really used it much yet, or if they're just not doing anything complicated.