It's super obvious even if you try and use something like agent mode for coding, it starts off well but drifts off more and more. I've even had it try and do totally irrelevant things like indent some code using various Claude models.

My favourite example is something that happens quite often even with Opus, where I ask it to change a piece of code, and it does. Then I ask it to write a test for that code, it dutifully writes one. Next, I tell it to run the test, and of course, the test fails. I ask it to fix the test, it tries, but the test fails again. We repeat this dance a couple of times, and then it seemingly forgets the original request entirely. It decides, "Oh, this test is failing because of that new code you added earlier. Let me fix that by removing the new code." Naturally, now the functionality is gone, so it confidently concludes, "Hey, since that feature isn't there anymore, let me remove the test too!"

"Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away." - Claude, probably