Balancing Precision, Recall and PerformanceApril 21, 2021 Tweet
Interview with Paul Anderson, VP of Engineering at GrammaTech
Paul Anderson has been a software engineer for more than 30 years, and a computer scientist since the 1980’s. He’s worked with many government agencies on advanced research on code security in sensitive environments such as IoT.
In this interview, he explains the three main ways of tuning SAST tools to meet the developer’s objectives: speedy findings in real-time, or slightly slower but more precise scans, or deeper recall to find real bugs across the software stack.
Can you explain the tradeoff of speed versus performance and provide examples of when, in the development process, this tradeoff is acceptable?
If you’re a developer making changes to the code and want to run the tool and get back results almost instantaneously at the same rate in which you’re doing your builds or compiles, you configure the tool to find the most obvious problems first. So, for example, you’re looking for violations of a coding standard, or a function call that’s inherently insecure in the desktop run environment, or a function call that ignores the return value.
Finding violations of these rules is pretty easy and typically doesn’t require a lot of computing power. So, in these cases, you turn off some of the more sophisticated checkers and run analysis only on the small subset of code you’re working on. The results are usually not as good as when doing a whole program analysis, but this is often an acceptable tradeoff when developers make a change to a handful of files at a time and do their builds incrementally while running static analysis on only the code components that they changed.
And under what circumstances would developers opt for deeper, more precise analysis of their codebase?
Higher precision means fewer false positives. SAST tools give you differing levels of control over the level of analysis required to make more precise judgements on what is or is not a bug in the code. An example you might be looking for a leak because the change that you’ve made may be risky. So, you turn on leak detection, which benefits strongly from doing a whole program analysis. If you let the CI/CD pipeline do the analysis taking into account all the modules, you’re likely to have fewer false positives than if you did the analysis on only one file.
Can you also give us some examples of when recall is more important than speedy performance?
To get the most from a static analysis tool, it is very helpful to run it in a DevOps CI/CD pipeline. In that case, you want to configure the static analysis tool to look harder for bugs and vulnerabilities across the full stack and optimize the scans for recall.
For example, SQL injection is a tricky vulnerability to find, and detecting one typically requires tracking information across compilation unit boundaries, which means scanning code components in many different parts of the program. This kind of analysis is often referred to as hazardous information flow or taint analysis. You can set the SAST tool up to do an analysis where it looks at all parts of the program simultaneously to understand the information flow across those modules.
If the scan is being run by the CI/CD pipeline, the developer has committed the change to the repository, so that’s the point that the code is ready to build and run, and it is appropriate to do a deep scan at this stage. With modern code management tools like GitLab and GitHub, you can set those up so that when you do that commit or issue a merge request the SAST tools run automatically. So, for example, with GrammaTech’s integration with GitLab, the results come back in the interfaces with a tab that allows you view the SAST results for the commit or merge request.
Pro Tip: Learn how GrammaTech’s CodeSonar 6.0 integrates into the GitLab CI/CD pipeline.
How do developers know when a result is a false positive?
False positives can be challenging to distinguish from true positives. CodeSonar helps by showing the path the code must take in order for that finding to be a real problem. If your program is deployed in an environment where that problem would never be triggered, this tool can help you understand that. It’s a chore to go through false positives and rule them out. But once you’ve decided something is a false positive, you can mark it as such and never have to go through it again.