Loved this “lead bullets” framing, especially the parts on taint checking, scanners, and pre-processing/sampling logs. One practical add-on to the "Sensitive data scanners" section is verification: can you tell which candidates are actually live creds?
We’ve been working on an open source tool, Kingfisher, that pairs fast detection (Hyperscan + Tree-Sitter) with live validation for a bunch of providers (cloud + common SaaS) so you can down-rank false positives and focus on the secrets that really matter. It plugs in at the chokepoints this post suggests: CI, repo/org sweeps, and sampled log archives (stdin/S3) after a Vector/rsyslog hop.
Examples:
kingfisher scan /path/to/app.log --only-valid
kingfisher scan --s3-bucket my-logs --s3-prefix prod/2025/09/
Baselines help keep noise down over time.Repo: https://github.com/mongodb/kingfisher (Apache-2.0)
Disclosure: I help maintain Kingfisher.