Inconsistent measurement and incorrect detection of software names in security vulnerability reports

As the number of vulnerability databases established by various nations continues to grow, they have accumulated hundreds of thousands of security vulnerability reports, which play a crucial role in protecting system security. However, many databases are found to lack essential information, contain...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computers & security 2023-12, Vol.135, p.103477, Article 103477
Hauptverfasser: Sun, Hongyu, Ou, Guoliang, Zheng, Ziqiu, Liao, Lei, Wang, He, Zhang, Yuqing
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:As the number of vulnerability databases established by various nations continues to grow, they have accumulated hundreds of thousands of security vulnerability reports, which play a crucial role in protecting system security. However, many databases are found to lack essential information, contain inaccuracies, or are inconsistent with others. Despite these challenges, the importance of vulnerability databases continues to grow. Current research on vulnerability databases is limited to software version and vulnerability reproduction, but the software names, an essential component of vulnerability databases, have not been extensively studied. Understanding the consistency of software names in different vulnerability databases is crucial for improving the accuracy of vulnerability databases. The paper introduces VERNIER, an automated method for measuring inconsistencies in 789,954 sets of software names from nine security vulnerability databases (including CVE and NVD) from 1999 to 2019. We utilized a named entity recognition (NER) model with exceptional accuracy (99.5%) and F1 score (95.1%) to extract software names from unstructured Chinese and English vulnerability reports. VERNIER assesses software names' inconsistency at character and semantic levels. The results indicate that inconsistent software names are prevalent in vulnerability databases. The average of the exact matching rate between NVD and other mainstream databases, such as CVE, is only 20.3% at the character-level and 43.3% at the semantic-level. We also discover internal inconsistencies between the structured and unstructured software names inside the same vulnerability database (e.g., NVD). To mitigate the inconsistency, we implement an alert tool using inconsistencies to detect incorrect software names. This tool can effectively warn and correct software names.
ISSN:0167-4048
DOI:10.1016/j.cose.2023.103477