> Same story with tests. Who decides when it's the test that should be changed/deleted or the implementation?

Claude is remarkably good at figuring this is out. I asked it to look at a failing test in a large and messy Python codebase. It found the root cause and then asked whether the failure was either a regression or an insufficiently specified test, performed its own investigation, and found that the test harness was missing mocks that were exposed by the bug fix.

It has become amazingly good at investigating.