The plurality and availability of open source software together with the ease of getting working exploits to newly discovered vulnerabilities make continuous vulnerability scanning and patching one of the most critical tasks in today’s cloud native environments. Performing regular scans for host, container images, container registry and lambda functions enables SecDevOps to substantially eliminate common security risks by ensuring their environment is always up-to-date and vulnerability free.
In this blog post, we explore the challenges of building a robust, accurate, and efficient cloud native vulnerability scanning solution and explore one of the common pains that arises in many scanning tools.
Breaking down the security scanning process
Detecting public vulnerabilities in software is usually composed from two separate phases:
- Accurate vulnerability feed creation – generating an accurate and up-to-date vulnerability feed with high-coverage based on open-sourced and proprietary data
- Packages metadata extraction – detect packages, installed products, and the correlation between the two (e.g., openssl version 1.0.2g is used by nginx 1.9). Such metadata is retrieved for packages on hosts, within container images, and within serverless function bundles.
Both phases comes with unique and non-trivial challenges, e.g., completing blazing fast scans, handling statically-compiled and custom libraries, handling various packaging systems, storing and distributing feeds in a compact manner, and ensuring the resulting data feed is false-positive free, which will be the main focus of this blog post.
Building an accurate, robust vulnerability feed
The quality of the vulnerability feed is usually measured by the following metrics:
- False positive rate – Ensure that the matched vulnerabilities are correct
- False negative rate – Ensure that the all applicable vulnerabilities are matched.
The problem becomes more acute In the cloud native ecosystem, where the deployment environment is composed of a large ensemble of different operating systems, programming languages, and close/open-source products. Ensuring all those variation are covered by the vulnerability feed and that no critical vulnerability is missed requires finding, downloading, and cleansing multiple data sources. As with any pipeline that ingest large amount of open-source data from a variety of data sources, there is always a risk of data discrepancies, e.g., bad package formatting, incorrect vulnerabilities rules, and more. Since customer rely heavily on the correctness of the vulnerability data (i.e., by blocking the CI system or preventing an image to being deployed in production) production outage can take place on any false positive event. It is highly important to apply smart mechanisms for dealing with data discrepancies in various data sources before they are published to customer.
One of the major vulnerability data sources is the National Vulnerability Database. Commonly known as NVD, it is a U.S. government database created and maintained by NIST. The NVD is a superset of a list of publicly known security vulnerabilities, maintained by MITRE. CVE is accepted by the security community, internationally, as the defacto database for vulnerabilities. Users and developers of both closed and open software attempt to issue CVE IDs for all security issues found.
The NVD is synchronized with the CVE and adds an entry for every CVE vulnerability assigned including analysis of the vulnerability, additional info and severity scoring. It is not uncommon for the NVD to be automatically fed to threat intelligence and vulnerability management software. Thus, errors in the either the CVE or NVD could put users at risk of being exposed to publicly known vulnerabilities. The implications of such errors could be from mild to critical depending on the severity of the vulnerability, as there are always malicious actors that actively try to exploit known CVEs.
The Twistlock approach
At Twistlock, we make an effort to fix any errors in the NVD or other data source that we find. One example of such an issue we found was CVE-2017-8399 in PCRE2. We found a discrepancy between the CVE description and the CVE rules. In NVD database, each CVE has a unique common platform enumeration, which specify a unique identifier for each product, and contains the list of rules for which the vulnerability apply. In this specific case , a PCRE2 had the following rule: version<=1.30, however, the description of the vulnerability clearly stated that:
PCRE2 before 10.30 has an out-of-bounds write caused by a stack-based buffer overflow in pcre2_match.c, related to a “pattern with very many captures.”
Twistlock has a dedicated intelligence service, that is queried by our users, that performs automatic correlations between multiple data sources; and in cases where it finds clear discrepancy, blocks publishing and automatically alerts our vulnerability research team to further investigate.
As with all customer facing issues, our first step is to immediately stop the bleeding, manually patch the issue, and distribute the changes in the feed to our customers in a fast and efficient manner, which is by its own an interesting engineering problem.
However, it’s also our obligation to the community to patch this at the vendor level, which is a practice with any open source data we consume. In the case mentioned above with PCRE2, our lead researcher Ariel Zelivansky contacted NVD and CVE and the issue was resolved promptly.
In case our readers ever encounter errors in either feeds, we urge them to get in touch with the feed maintainers, or also possibly leave us a tip for us to take care of the issue.
Follow us on Twitter
Follow us on Twitter for real time updates on the cloud native ecosystem, Twistlock product, and cloud native security threats.