CODE-SMASH: Source-Code Vulnerability Detection Using Siamese and Multi-Level Neural Architecture

The rapid evolution of software development, propelled by competitive demands and the continuous integration of new features, frequently leads to inadvertent security oversights. Traditional security practices, often reactive in nature, primarily focus on identifying known vulnerabilities, creating...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2024, Vol.12, p.102492-102504
Hauptverfasser:	Han, Sungmin, Nam, Hyunkyung, Kang, Jaesik, Kim, Kwangsoo, Cho, Seungjae, Lee, Sangkyun
Format:	Artikel
Sprache:	eng
Schlagworte:	Chromium Code similarity Codes Computer architecture Datasets Detection algorithms hierarchical attention network Neural networks Programming languages Security Siamese neural network Software development Software development management Software reliability Source code source code vulnerability detection Source coding Static analysis Vectors
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The rapid evolution of software development, propelled by competitive demands and the continuous integration of new features, frequently leads to inadvertent security oversights. Traditional security practices, often reactive in nature, primarily focus on identifying known vulnerabilities, creating a significant shortfall in detecting emergent, zero-day threats. This paper introduces CODE-SMASH, a novel deep learning-based source code vulnerability detector that utilizes a Siamese neural network with a hierarchical architecture integrating BiGRU and attention mechanisms. Our experiments using real-world datasets, specifically the Chromium and Debian datasets, demonstrate CODE-SMASH's superiority over existing methods. It achieves significant improvements in detection performance across all key metrics, including accuracy, precision, recall, and F1-score, with average improvements of approximately 8.3%, 11.6%, 27.75%, and 17.7%, respectively, compared to the best-performing existing methods in our experiments. Moreover, CODE-SMASH shows its superior capability in handling complex and lengthy code sequences, with performance improvements for long-length code (60 to 80 lines) in F1 scores of 4.53 percentage points on the Chromium dataset and 5.62 percentage points on the Debian dataset compared to the second-best model's performance. We believe our research makes a significant contribution to the field of automated vulnerability detection by providing a high-precision solution to the growing challenges in software security. Furthermore, based on our findings, we anticipate that future research could enhance CODE-SMASH by expanding its generalizability to various programming languages and reducing computational demands to improve efficiency.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2024.3432323