Similar but Patched Code Considered Harmful -- The Impact of Similar but Patched Code on Recurring Vulnerability Detection and How to Remove Them
Identifying recurring vulnerabilities is crucial for ensuring software security. Clone-based techniques, while widely used, often generate many false alarms due to the existence of similar but patched (SBP) code, which is similar to vulnerable code but is not vulnerable due to having been patched. A...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Identifying recurring vulnerabilities is crucial for ensuring software
security. Clone-based techniques, while widely used, often generate many false
alarms due to the existence of similar but patched (SBP) code, which is similar
to vulnerable code but is not vulnerable due to having been patched. Although
the SBP code poses a great challenge to the effectiveness of existing
approaches, it has not yet been well explored.
In this paper, we propose a programming language agnostic framework, Fixed
Vulnerability Filter (FVF), to identify and filter such SBP instances in
vulnerability detection. Different from existing studies that leverage function
signatures, our approach analyzes code change histories to precisely pinpoint
SBPs and consequently reduce false alarms. Evaluation under practical scenarios
confirms the effectiveness and precision of our approach. Remarkably, FVF
identifies and filters 65.1% of false alarms from four vulnerability detection
tools (i.e., ReDeBug, VUDDY, MVP, and an elementary hash-based approach)
without yielding false positives.
We further apply FVF to 1,081 real-world software projects and construct a
real-world SBP dataset containing 6,827 SBP functions. Due to the SBP nature,
the dataset can act as a strict benchmark to test the sensitivity of the
vulnerability detection approach in distinguishing real vulnerabilities and
SBPs. Using this dataset, we demonstrate the ineffectiveness of four
state-of-the-art deep learning-based vulnerability detection approaches. Our
dataset can help developers make a more realistic evaluation of vulnerability
detection approaches and also paves the way for further exploration of
real-world SBP scenarios. |
---|---|
DOI: | 10.48550/arxiv.2412.20740 |