Not only for penetration testing it is interesting to know, which vulnerabilities exist for a certain software product. Also from the perspective of an IT team it can be useful to quickly obtain information about an employed product version. So far various databases existed for these queries like e.g., https://nvd.nist.gov/vuln/search, https://cvedetails.com or https://snyk.io/vuln
However, during the last years, we could identify several issues with these databases:
- Many databases only index vulnerabilities for certain product groups (e.g., Snyk: Web Technologies)
- Many databases search for keywords in full-text descriptions. Searching for specific product versions is not precise.
- Many databases are outdated or list incorrect information
Figure: Incorrect vulnerability results for Windows 10
Figure: Keyword search returns a different product than the originally searched for product
This is why we decided to implement our own solution. We considered the following key points:
- Products and version numbers can be searched using unique identifiers. This allows a more precise search query.
- The system performs a daily import of the lastest vulnerability data from the National Institute of Standards and Technology (NIST). Vulnerabilities are thus kept up to date and have a verified CVE entry.
- The system is based on Elastic Stack https://www.elastic.co/de/elastic-stack/ to query and visualize data in real time.
Technical Implementation: NIST NVD & Elastic Stack
Upon finding vulnerabilities in products, security researchers commonly register a CVE entry per vulnerability. These CVE entries are given a unique identifier, detailed vulnerability information, as well as a general description.
They can be registered at https://cve.mitre.org and are indexed in the National Vulnerability Database (NVD) in real time (https://cve.mitre.org/about/cve_and_nvd_relationship.html). NIST publishes these data sets publicly and freely, which contain all registered vulnerabilities. We use this data stream as a basis for our own database.
The technical details of the data import and subsequent provisioning are illustrated as follows:
Figure: Overview of the technical components of the vulnerability database
1. Daily import of vulnerability data from the NIST NVD
The data sets are organized by year numbers and refreshed daily by NIST. Every night we download the latest files onto our file server.
2. Pre-Processing of vulnerability data
Afterwards the files are pre-processed to make them compatible with the Elastic Stack Parser. One process that happens here is the expansion of all JSON files: The downloaded files contain JSON objects, however they are often nested, which makes it harder to identify single objects for the parser. We read the JSON and write all object seperators into separate lines. This way we can use a regex ( ‘^{‘ ) to precisely determine, when a new object begins.
Furthermore we strip the file of all unneeded metadata (e.g., autor, version information, etc.), which leaves only the CVE entries in the file as sequential JSON objects.
3. Reading in the pre-processed vulnerability data using Logstash
After the pre-processing, our Logstash parser is able to read the individual lines of the files using the Multiline Codec (https://www.elastic.co/guide/en/logstash/current/plugins-codecs-multiline.html). Every time a complete JSON object is read in, Logstash forwards this CVE object to our Elasticsearch instance.
The CVE Quick Search – Data formats and vulnerability queries
After all CVE entries were read and stored in the Elasticsearch database, we have to understand, which format these entries have and how we can search them for specific products and product vulnerabilities. Our final result is illustrated in the following screenshot: Using unique identifiers, we can return exact vulnerability reports for the queried product version.