I have been testing agentic coding with Claude 4.5 Opus and the problem is that it's too good at documentation and test cases. It's thorough in a way that it goes out of scope, so I have to edit it down to increase the signal-to-noise.
I have been testing agentic coding with Claude 4.5 Opus and the problem is that it's too good at documentation and test cases. It's thorough in a way that it goes out of scope, so I have to edit it down to increase the signal-to-noise.
The “change capture”/straight jacket style tests LLMs like to output drive me nuts. But humans write those all the time too so I shouldn’t be that surprised either!
What do these look like?
These tests also break encapsulation in many cases because they're not testing the interface contract, they're testing the implementation.
Juniors on one of the teams I work with only write this kind of tests. It’s tiring, and I have to tell them to test the behaviour, not the implementation. And yet every time they do the same thing. Or rather their AI IDE spits these out.
You beat me to it, and yep these are exactly it.
“Mock the world then test your mocks”, I’m simply not convinced these have any value at all after my nearly two decades of doing this professionally
If the goal is to document the code and it gets sidetracked and focuses on only certain parts it failed the test. It just further proves llm's are incapable of grasping meaning and context.