The refactor step is the silent casualty in AI-assisted TDD. Once the test is green, Claude optimizes for moving to the next test, not for cleaning up the impl that just passed. An "iterate-until-clean" pass at the end is a different thing: you're refactoring cold code, not refactoring with a freshly-written test as the safety net.
When I first used agentic coding I was already doing strict TDD and I just tried using it for the refactor step.
It sucked so hard I thought the idea of agentic coding was just a joke. Ive tried it periodically and it literally never stopped sucking.
I figure if it cant do that part it isnt worth using it for any part.
Ever since then whenever people tell me it's gotten better I've tried it out and nope, still sucks.
I still get gaslit about how well it works by people who just discovered TDD though, and watch it power through CRUD boilerplate getting impressed, blissfully unaware that boilerplate spew is an antipattern.