Agreed!
I've spent the last few months using Claude Code and Cursor - experimenting with both. For simple tasks, both are pretty good (like identifying a bug given console output) - but when it comes to making a big change, like adding a brand new feature to existing code that requires changes to lots of files, writing tests, etc - it often will make at least a few mistakes I catch on review, and then prompting the model to fix those mistakes often causes it to fix things in strange ways.
A few days ago, I had a bug I just couldn't figure out. I prompted Claude to diagnose and fix the issue - but after 5 minutes or so of it trying out different ideas, rerunning the test, and getting stuck just like I did - it just turned off the test and called it complete. If I wasn't watching what it was doing, I could have missed that it did that and deployed bad code.
The last week or so, I've totally switched from relying on prompting to just writing the code myself and using tab complete to autocomplete like 80% of it. It is slower, but I have more control and honestly, it's much more enjoyable of an experience.
Drop in a lint rule to fail on skipped tests. Ive added these at a previous job after finding that tests skipped during dev sometimes slipped through review and got merged.