If that's the point they are making, let's see their false positive rate that it produces on the entire codebase.
They measured false negatives on a handful of cases, but that is not enough to hint at the system you suggest. And based on my experiences with $$$ focused eval products that you can buy right now, e.g. greptile, the false positive rate will be so high that it won't be useful to do full codebase scans this way.