A dual graph neural networks model using sequence embedding as graph nodes for vulnerability detection

Detecting critical to ensure software system security. The traditional static vulnerability detection methods are limited by staff expertise and perform poorly with today’s increasingly complex software systems. Researchers have successfully applied the techniques used in NLP to vulnerability detect...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Information and software technology 2025-01, Vol.177, p.107581, Article 107581
Hauptverfasser: Ling, Miaogui, Tang, Mingwei, Bian, Deng, Lv, Shixuan, Tang, Qi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Detecting critical to ensure software system security. The traditional static vulnerability detection methods are limited by staff expertise and perform poorly with today’s increasingly complex software systems. Researchers have successfully applied the techniques used in NLP to vulnerability detection as deep learning has developed. The existing deep learning-based vulnerability detection models can be divided into sequence-based and graph-based categories. Sequence-based embedding models cannot use structured information embedded in the code, and graph-based embedding models lack effective node representations. To solve these problems, we propose a deep learning-based method, DGVD (Double Graph Neural Network for Vulnerability Detection). We use the sequential neural network approach to extract local semantic features of the code as nodes embedded in the control flow graph. First, we propose a dual graph neural network module (DualGNN) that consists of GCN and GAT. The altered module utilizes two different graph neural networks to obtain the global structural information of the control flow and the relationship between the nodes and fuses the two. Second, we propose a convolution-based feature enhancement module (TC-FE) that uses different convolution kernels of different sizes to capture information at different scales so that subsequent readout layers can better aggregate node information. Experiments demonstrate that DGVD outperforms existing models, obtaining 64.23% vulnerability detection accuracy on CodeXGLUE’s real benchmark dataset. The proposed DGVD achieves better performance than the state-of-the-art DGVD has a more effective source code feature extraction capability on real-world datasets.
ISSN:0950-5849
DOI:10.1016/j.infsof.2024.107581