As of today though, that doesn't work. Even straightforward tasks that are perfectly spec-ed can't be reliably done with agents, at least in my experience.

I recently used Claude for a refactor. I had an exact list of call sites, with positions etc. The model had to add .foo to a bunch of builders that were either at that position or slightly before (the code position was for .result() or whatever.) I gave it the file and the instruction, and it mostly did it, but it also took the opportunity to "fix" similar builders near those I specified.

That is after iterating a few times on the prompt (first time it didn't want to do it because it was too much work, second time it tried to do it via regex, etc.)