This is where you're wrong, I ran an experiment and told it to find bugs in a ~200 LoC project. The models are tuned in a way to where they're expected to generate issue reports so a codebase that had zero bugs, zero vulnerabilities and zero changes needed it found 3 low severity issues (cosmetic) 1 medium severity issue and 1 critical severity issue. The critical severity issue was accepting unvalidated user input, for... an echo command.

Did you make any attempt at tuning the prompt to reduce false positives? Or did you just say "find bugs"? Because if you tell it to do that, it will.

The point I was trying to make is that there will always be people reporting "critical" bugs.