CoCoFuzzing: Testing Neural Code Models with Coverage-Guided Fuzzing
Deep learning-based code processing models have shown good performance for tasks such as predicting method names, summarizing programs, and comment generation. However, despite the tremendous progress, deep learning models are often prone to adversarial attacks, which can significantly threaten the...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Deep learning-based code processing models have shown good performance for
tasks such as predicting method names, summarizing programs, and comment
generation. However, despite the tremendous progress, deep learning models are
often prone to adversarial attacks, which can significantly threaten the
robustness and generalizability of these models by leading them to
misclassification with unexpected inputs. To address the above issue, many deep
learning testing approaches have been proposed, however, these approaches
mainly focus on testing deep learning applications in the domains of image,
audio, and text analysis, etc., which cannot be directly applied to neural
models for code due to the unique properties of programs. In this paper, we
propose a coverage-based fuzzing framework, CoCoFuzzing, for testing deep
learning-based code processing models. In particular, we first propose ten
mutation operators to automatically generate valid and semantically preserving
source code examples as tests; then we propose a neuron coverage-based approach
to guide the generation of tests. We investigate the performance of CoCoFuzzing
on three state-of-the-art neural code models, i.e., NeuralCodeSum, CODE2SEQ,
and CODE2VEC. Our experiment results demonstrate that CoCoFuzzing can generate
valid and semantically preserving source code examples for testing the
robustness and generalizability of these models and improve the neuron
coverage. Moreover, these tests can be used to improve the performance of the
target neural code models through adversarial retraining. |
---|---|
DOI: | 10.48550/arxiv.2106.09242 |