> Sashiko was able to find around 53% of bugs
That's cool. Another interesting metric, however, would be the false positive ratio: like, I could just build a bogus system that simply marks everything as a bug and then claim "my system found 100% of all bugs!"
In practice, not just the recall of a bug finding system is important but also its precision: if human reviewers get spammed with piles of alleged bug reports by something like Sashiko, most of which turn out not to be bugs at all, that noise binds resources and could undermine trust in the usefulness of the system.
They mention false positives as well on github: The rate of false positives is harder to measure, but based on limited manual reviews it's well within 20% range and the majority of it is a gray zone.