Goals : Convert a HTML based CVE descriptions into a graph database Description: The Common Vulnerabilities and Exposures (CVE) is a representation of security vulnerabilities discovered by security experts. The National Vulnerability Database (NVD) is a dictionary database maintained by the government to enable searching, performing statistical analysis and serves as a central information resource for the CVEs. Essentially, an NVD is a database of the CVEs. However, navigating the NVD is a tedious task and it does not allow users to view multiple vulnerabilities and make logical inferences amongst them. In this project, we will navigate a subset of the NVD database and convert the data into a graphical database in Neo4j. A graphical database represents relationships among the different data entities in the database and allows for intuitive queries using the easy to use Cypher query language. Skills Required: 1. Text Processing Skills in some programming language, preferably Python. 2. Basic networking skills Informational Sites: 1. Full listing of all vulnerabilities: https://nvd.nist.gov/vuln/full-listing 2. Full listing of all CVE-IDs: http://cve.mitre.org/data/downloads/index.html 3. If a given CVE-ID is: X then you can access the information of this CVE-ID by using https://nvd.nist.gov/vuln/detail/X E.g., Obtain CVE-ID from CVE-2017-18008 and visit https://nvd.nist.gov/vuln/detail/CVE-2017-18008 (needs SSL support in Python: Example installation: http://www.webtop.com.au/blog/compiling-python-with-ssl-support-fedora-10-2009020237). For other OSes try Googling. 4. Once the CVE description is obtained, populate this information into a graph database. Specific Requirements: 1. For example a given CVE-ID : CVE-2017-18008 has the following description CVSS v3.0 Severity and Metrics: Base Score: 6.5 MEDIUM Vector: AV:N/AC:L/PR:N/UI:R/S:U/C:N/I:N/A:H (V3 legend) Impact Score: 3.6 Exploitability Score: 2.8 Attack Vector (AV): Network Attack Complexity (AC): Low Privileges Required (PR): None User Interaction (UI): Required Scope (S): Unchanged Confidentiality (C): None Integrity (I): None Availability (A): High This information can be represented with different relations like: (CVE-ID:(CVE-2017-18008) has_base_score Base Score:(6.5 MEDIUM)) (CVE-ID:(CVE-2017-18008) is_encoded_as Vector: AV:N/AC:L/PR:N/UI:R/S:U/C:N/I:N/A:H (V3 legend)) (CVE-ID:(CVE-2017-18008) is_attackable through Attack Vector (AV): (Network) ) (CVE-ID:(CVE-2017-18008) has_impact Impact Score:(3.6) ) (CVE-ID:(CVE-2017-18008) was_modified Date: (01/01/2018)) And so on. 2. Additional information may be gathered from the CVE-ID description and also, by parsing the HTML file obtained, we can obtain the software name and the version that has this vulnerability. 3. The technical analysis of this vulnerability can be found on the same page under "Analysis Description" 4. Finally, the English description of the vulnerability with CVE-ID X can be found in http://cve.mitre.org/cgi-bin/cvename.cgi?name=X 5. Once the graph database is populated: execute and show the following queries (further queries will be provided later) a. How many vulnerabilities are found in the same software with the same impact score? b. How many vulnerabilities are found in the same software with the same impact score across two consecutive years in all the possible years reported? c. How many vulnerabilities required User Interaction through the network?