That works until you make a plan/tests/etc, set the thing loose, and then when it has trouble it decides "actually the pragmatic thing would be [diverge from the plan/change the tests/etc]" and goes off the rails. I'm so frustrated by these things right now.

I have honestly not had that problem much. Being specific, concise, and strong with your prompts helps out a lot.