Hacker News

I don’t have examples but I have an LLM driven project with like…2500 tests and I regularly need to prune:

* no-op tests

* unit tests labeled as integration tests

* skipped tests set to skip because they were failing and the agent didn’t want to fix them

* tests that can never fail

Probably at any given time the tests are 2-4% broken. I’d say about 10% of one-shot tests are bogus if you’re just working w spec + chat and don’t have extra testing harnesses.