I think tests should be rewritten as much as needed. But to counter the invariant part, maybe let the user zoom back and forth through past revisions and pull in whatever they want to the current version, in case something important is deleted? And then allow “pinning” of some stuff so it can’t be changed? Would that solve for your concerns?

> I think tests should be rewritten as much as needed.

Yes, I agree. The nuance is that they need to be rewritten independently and without touching the code. You can't change both and expect to get a working system.

I'm speaking based on personal experience, by the way. Today's LLMs don't enforce correctness out of the box and agent mode has only one goal: getting things to work. I had agent mode flip invariants in tests when trying to fix unit tests it broke, and I'm talking about egregious changes such as flipping requirements in line with "normal users should not have access to the admin panel" to "normal users should have access to the admin panel". The worst part is that if agent mode is left unsupervised, it will even adjust the CSS to make sure normal users have a seamless experience going through the admin panel.

Agreed that's a concern.

There could be some visual language for how recently changes happened to the LLM-generated tests (or code for TDD mode).. then you'd be able to see that a test failed and was changed recently. Would that help?