Toward Improved Deep Learning-based Vulnerability Detection
Deep learning (DL) has been a common thread across several recent techniques for vulnerability detection. The rise of large, publicly available datasets of vulnerabilities has fueled the learning process underpinning these techniques. While these datasets help the DL-based vulnerability detectors, t...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Deep learning (DL) has been a common thread across several recent techniques
for vulnerability detection. The rise of large, publicly available datasets of
vulnerabilities has fueled the learning process underpinning these techniques.
While these datasets help the DL-based vulnerability detectors, they also
constrain these detectors' predictive abilities. Vulnerabilities in these
datasets have to be represented in a certain way, e.g., code lines, functions,
or program slices within which the vulnerabilities exist. We refer to this
representation as a base unit. The detectors learn how base units can be
vulnerable and then predict whether other base units are vulnerable. We have
hypothesized that this focus on individual base units harms the ability of the
detectors to properly detect those vulnerabilities that span multiple base
units (or MBU vulnerabilities). For vulnerabilities such as these, a correct
detection occurs when all comprising base units are detected as vulnerable.
Verifying how existing techniques perform in detecting all parts of a
vulnerability is important to establish their effectiveness for other
downstream tasks. To evaluate our hypothesis, we conducted a study focusing on
three prominent DL-based detectors: ReVeal, DeepWukong, and LineVul. Our study
shows that all three detectors contain MBU vulnerabilities in their respective
datasets. Further, we observed significant accuracy drops when detecting these
types of vulnerabilities. We present our study and a framework that can be used
to help DL-based detectors toward the proper inclusion of MBU vulnerabilities. |
---|---|
DOI: | 10.48550/arxiv.2403.03024 |