I find coding agents can produce very high quality tests if and only if you give them detailed guidance and good starting examples.
Ask a coding agent to build tests for a project that has none and you're likely to get all sorts of messy mocks and tests that exercise internals when really you want them to exercise the top level public API of the project.
Give them just a few starting examples that demonstrate how to create a good testable environment without mocking and test the higher level APIs and they are much less likely to make a catastrophic mess.
You're still going to have to keep an eye on what they're doing and carefully review their work though!
> I find coding agents can produce very high quality tests if and only if you give them detailed guidance and good starting examples.
I find this to be true for all AI coding, period. When I have the problem fully solved in my head, and I write the instructions to explicitly and fully describe my solution, the code that is generated works remarkably well. If I am not sure how it should work and give more vague instructions, things don't work so well.
Yeah, same. Usually I'll ask the agent for a few alternatives, to make sure I'm not missing something, but the solution I wanted tends to be the best one. I also get into a lot of me saying "hm, why are you doing it that way?" "Oh yeah, that isn't actually going to work, sorry".
Yes, but the act of writing code is an important part of figuring out what you need. So I’m left wondering how much of a prefect the AI can actually help with. To be clear I do use AI for some code gen. But I try to use it less than I see others use it.
Eh, I think my decades of experience writing my own code was necessary for me to develop the skills to be able to precisely tell the AI what to build, but I don't think I need to (always) write new code to know how to know what I need.
Now, if the thing I am building requires a technology I am not familiar with, I will spend some time reading and writing some simple test code to learn how it works, but once I understand it I can then let the AI build from scratch.
Of course, this does rely on the fact that I have years of coding experience that came prior to AI, and I do wonder how new coders can do it without putting in the work to learn how to build working software without AI before using AI.
It’s not just about new tech. It’s about new businesses and projects.
I feel like that leaves me with the hard part of writing tests, and only saves me the bit I can usually power through quickly because it's easy to get into a flow state for it.
Left to his own devices, I found Claude liked to copy the code under test into the test files to 'remove dependencies' :/
Or would return early from playwright tests when the desired targets couldn't be found instead of failing.
But I agree that with some guidance and a better CLAUDE.md, can work well!
I've think they're also much better at creating useful end to end UI tests than unit or integration tests, but unfortunately those are hard to create self contained environments for without bringing a lot of baggage and docker containers, which not all agent VMs might support yet. Getting headless QT running was a pain too, but now ChatGPT Codex can see screenshots and show them in chat (Claude Code can't show them in the chat for some reason) and it's been generating much better end to end tests than I've seen for unit/integration.
Has anyone had success with specific prompts to avoid the agent over-indexing on implementation details? For instance, something like: "Before each test case, add a comment justifying the business case for every assumption made here, without regards to implementation details. If this cannot be made succinct, or if there is ambiguity in the business case, the test case should not be generated."
I've had reasonable success from doing something like this, though it is my current opinion that it's better to write the first few tests yourself to establish a clear pattern and approach. However, if you don't care that much (which is common with side projects):
Starting point: small-ish codebase, no tests at all:
and etc. For a project with an existing and mature test suite, it's much easier: I've also found it helpful to put things in AGENTS.md or CLAUDE.md about tests and my preferences, such as: I do want to stress that every project and framework is different and has different needs. As you discover the AI doing something you don't like, add it to the prompts or the AGENTS.md/CLAUDE.md. Eventually it will get pretty decent, though never blindly trust it because a butterfly flapping it's wings in Canada sometimes causes it to do unexpected things.Does it depend on the model? I would have expected the bigger ones to be better with common sense and not fixating on irrelevant details. But I have only used them with quite small codebases so far. (Which have basically no internals to exercise!)
Indeed the case - luckily my codebase had some tests already and a pretty decent CLAUDE.md file so I got results I’m happy with.
I was able to do this with vitest and a ton of lint rules.