> most halfway decent models can write damn good code for a fraction of the price
The problem isn't what they do in a blank state. It is how they get there and the edge cases. Some models also take longer (uses more steps) i.e. end up costing more despite being "cheaper".
I've seen models:
- Back out plans non-stop. Tried the obvious path. Invents X/Y/Z excuse (without verifying) that it can't be done. Notes that down and moves on. It could be as simple as site A being down and to download from site B but that's it.
- Hacks the test to make it work. Code is wrong? Nah, let's update the test.
- Keep saying useless things like YAGNI and infinite excuses like too risky to never do the work.
- Claims they are done but there's 100 edge cases not covered. When you try to use it it fails in ways you as a human assume it should work. You can write a spec to cover it all but then what's the point?
- Be trigger happy and never investigate. Tries to do it. 5 minutes. Oh it failed. Back out. Repeat. Better models definitely spend more time analyzing and actually "think". I've had models spend hours trying to do a change due to this method when an actual investigation (code walkthrough) might have solved it.
- Know and use the right tools. A lot of lesser models have infinite fear e.g. oh docker might not be available (it is) or this and that (even if you nudge it in any way) and spend a lot of extra time "working around" it.
The list goes on. Better models definitely help.
Only thing to agree on is no you don't need Fable but saying Sonnet can do the job instead of Opus is a different story. It's so obvious when Sonnet touches the code that I can't give it more than 5 minutes. It lies. Doesn't check. Forgets things and then messes up.