The previous company I was working at (6 months ago) had a bunch of microservices, most in python using fastapi and pydantic. At one point the security team tuned on CodeQL for a bunch of them, and we just got a bunch of false positives for not validating a UUID url path param to a request handler. In fact the parameter was typed in the handler function signature, and fastapi does validate that type. But in this strange case, CodeQL knew that these were external inputs, but didn't know that fastapi would validate that path param type, so it suggested adding redundant type check and bail-out code, in 100s of places.

The patterns we had established were as simple, basic, and "safe" as practical, and we advised and code-reviewed the mechanics of services/apps for the other teams, like using database connections/pools correctly, using async correctly, validating input correctly, etc (while the other teams were more focused on features and business logic). Low-level performance was not really a concern, mostly just high-level db-queries or sub-requests that were too expensive or numerous. The point is, there really wasn't much of anything for CodeQL to find, all the basic blunders were mostly prevented. So, it was pretty much all false-positives.

Of course, the experience would be far different if we were more careless or working with more tricky components/patterns. Compare to the base-rate fallacy from medicine ... if there's a 99% accurate test across a population with nothing for it to find, the "1%" false positive case will dominate.

I also want to mention a tendency for some security teams to decide that their role is to set these things up, turn them on, cover their eyes, and point the hose at the devs. Using these tools makes sense, but these security teams think it's not practical for them to look at the output and judge the quality with their own brains, first. And it's all about the numbers: 80 criticals, 2000 highs! (except they're all the same CVE and they're all not valid for the same reason)

Interesting, thanks. In the UUID example you mentioned, it seems the CodeQL model is missing some information about how FastAPI’s runtime validation works and so not drawing correct inferences about the types. It doesn’t seem to have a general problem with tracking request parameters coming into Python web frameworks — in fact, the first thing that really impressed me about CodeQL was how accurate its reports were with some quite old Django code — but there is a lot more emphasis on type annotations and validating input against those types at runtime in FastAPI.

I completely agree about the problem of someone deciding to turn these kinds of scanning tools on and then expecting they’ll Just Work. I do think the better tools can provide a lot of value, but they still involve trade-offs and no tool will get everything 100% right, so there will always be a need to review their output and make intelligent decisions about how to use it. Scanning tools that don’t provide a way to persistently mark a certain result as incorrect or to collect multiple instances of the same issue together tend to be particularly painful to work with.