CoCoFuzzing: Testing Neural Code Models with Coverage-Guided Fuzzing
Deep learning-based code processing models have shown good performance for tasks such as predicting method names, summarizing programs, and comment generation. However, despite the tremendous progress, deep learning models are often prone to adversarial attacks, which can significantly threaten the...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Wei, Moshi Huang, Yuchao Yang, Jinqiu Wang, Junjie Wang, Song |
description | Deep learning-based code processing models have shown good performance for
tasks such as predicting method names, summarizing programs, and comment
generation. However, despite the tremendous progress, deep learning models are
often prone to adversarial attacks, which can significantly threaten the
robustness and generalizability of these models by leading them to
misclassification with unexpected inputs. To address the above issue, many deep
learning testing approaches have been proposed, however, these approaches
mainly focus on testing deep learning applications in the domains of image,
audio, and text analysis, etc., which cannot be directly applied to neural
models for code due to the unique properties of programs. In this paper, we
propose a coverage-based fuzzing framework, CoCoFuzzing, for testing deep
learning-based code processing models. In particular, we first propose ten
mutation operators to automatically generate valid and semantically preserving
source code examples as tests; then we propose a neuron coverage-based approach
to guide the generation of tests. We investigate the performance of CoCoFuzzing
on three state-of-the-art neural code models, i.e., NeuralCodeSum, CODE2SEQ,
and CODE2VEC. Our experiment results demonstrate that CoCoFuzzing can generate
valid and semantically preserving source code examples for testing the
robustness and generalizability of these models and improve the neuron
coverage. Moreover, these tests can be used to improve the performance of the
target neural code models through adversarial retraining. |
doi_str_mv | 10.48550/arxiv.2106.09242 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2106_09242</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2106_09242</sourcerecordid><originalsourceid>FETCH-LOGICAL-a672-726a26b7657646f4d7f43e948c168e97b24211297280fb903ff92e5940d846853</originalsourceid><addsrcrecordid>eNotj8FOwzAQRH3pARU-gBP-gQTbsdc2N2RoQWrLJffIwetiKSXIaQr06wltLzOjkWa1j5BbzkpplGL3Pv-kQyk4g5JZIcUVeXK96xfj8Zg-tw-0xmE_BbrBMfuOuj4gXU_SDfQ77T-m4oDZb7FYjilgoJfhNZlF3w14c_E5qRfPtXspVm_LV_e4KjxoUWgBXkCrQWmQEGXQUVZopXnnYNDqdnqIc2G1MCy2llUxWoHKShaMBKOqObk7nz1hNF857Xz-bf5xmhNO9QeQ3UMh</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>CoCoFuzzing: Testing Neural Code Models with Coverage-Guided Fuzzing</title><source>arXiv.org</source><creator>Wei, Moshi ; Huang, Yuchao ; Yang, Jinqiu ; Wang, Junjie ; Wang, Song</creator><creatorcontrib>Wei, Moshi ; Huang, Yuchao ; Yang, Jinqiu ; Wang, Junjie ; Wang, Song</creatorcontrib><description>Deep learning-based code processing models have shown good performance for
tasks such as predicting method names, summarizing programs, and comment
generation. However, despite the tremendous progress, deep learning models are
often prone to adversarial attacks, which can significantly threaten the
robustness and generalizability of these models by leading them to
misclassification with unexpected inputs. To address the above issue, many deep
learning testing approaches have been proposed, however, these approaches
mainly focus on testing deep learning applications in the domains of image,
audio, and text analysis, etc., which cannot be directly applied to neural
models for code due to the unique properties of programs. In this paper, we
propose a coverage-based fuzzing framework, CoCoFuzzing, for testing deep
learning-based code processing models. In particular, we first propose ten
mutation operators to automatically generate valid and semantically preserving
source code examples as tests; then we propose a neuron coverage-based approach
to guide the generation of tests. We investigate the performance of CoCoFuzzing
on three state-of-the-art neural code models, i.e., NeuralCodeSum, CODE2SEQ,
and CODE2VEC. Our experiment results demonstrate that CoCoFuzzing can generate
valid and semantically preserving source code examples for testing the
robustness and generalizability of these models and improve the neuron
coverage. Moreover, these tests can be used to improve the performance of the
target neural code models through adversarial retraining.</description><identifier>DOI: 10.48550/arxiv.2106.09242</identifier><language>eng</language><subject>Computer Science - Software Engineering</subject><creationdate>2021-06</creationdate><rights>http://creativecommons.org/licenses/by-sa/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2106.09242$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2106.09242$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Wei, Moshi</creatorcontrib><creatorcontrib>Huang, Yuchao</creatorcontrib><creatorcontrib>Yang, Jinqiu</creatorcontrib><creatorcontrib>Wang, Junjie</creatorcontrib><creatorcontrib>Wang, Song</creatorcontrib><title>CoCoFuzzing: Testing Neural Code Models with Coverage-Guided Fuzzing</title><description>Deep learning-based code processing models have shown good performance for
tasks such as predicting method names, summarizing programs, and comment
generation. However, despite the tremendous progress, deep learning models are
often prone to adversarial attacks, which can significantly threaten the
robustness and generalizability of these models by leading them to
misclassification with unexpected inputs. To address the above issue, many deep
learning testing approaches have been proposed, however, these approaches
mainly focus on testing deep learning applications in the domains of image,
audio, and text analysis, etc., which cannot be directly applied to neural
models for code due to the unique properties of programs. In this paper, we
propose a coverage-based fuzzing framework, CoCoFuzzing, for testing deep
learning-based code processing models. In particular, we first propose ten
mutation operators to automatically generate valid and semantically preserving
source code examples as tests; then we propose a neuron coverage-based approach
to guide the generation of tests. We investigate the performance of CoCoFuzzing
on three state-of-the-art neural code models, i.e., NeuralCodeSum, CODE2SEQ,
and CODE2VEC. Our experiment results demonstrate that CoCoFuzzing can generate
valid and semantically preserving source code examples for testing the
robustness and generalizability of these models and improve the neuron
coverage. Moreover, these tests can be used to improve the performance of the
target neural code models through adversarial retraining.</description><subject>Computer Science - Software Engineering</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8FOwzAQRH3pARU-gBP-gQTbsdc2N2RoQWrLJffIwetiKSXIaQr06wltLzOjkWa1j5BbzkpplGL3Pv-kQyk4g5JZIcUVeXK96xfj8Zg-tw-0xmE_BbrBMfuOuj4gXU_SDfQ77T-m4oDZb7FYjilgoJfhNZlF3w14c_E5qRfPtXspVm_LV_e4KjxoUWgBXkCrQWmQEGXQUVZopXnnYNDqdnqIc2G1MCy2llUxWoHKShaMBKOqObk7nz1hNF857Xz-bf5xmhNO9QeQ3UMh</recordid><startdate>20210617</startdate><enddate>20210617</enddate><creator>Wei, Moshi</creator><creator>Huang, Yuchao</creator><creator>Yang, Jinqiu</creator><creator>Wang, Junjie</creator><creator>Wang, Song</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20210617</creationdate><title>CoCoFuzzing: Testing Neural Code Models with Coverage-Guided Fuzzing</title><author>Wei, Moshi ; Huang, Yuchao ; Yang, Jinqiu ; Wang, Junjie ; Wang, Song</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a672-726a26b7657646f4d7f43e948c168e97b24211297280fb903ff92e5940d846853</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer Science - Software Engineering</topic><toplevel>online_resources</toplevel><creatorcontrib>Wei, Moshi</creatorcontrib><creatorcontrib>Huang, Yuchao</creatorcontrib><creatorcontrib>Yang, Jinqiu</creatorcontrib><creatorcontrib>Wang, Junjie</creatorcontrib><creatorcontrib>Wang, Song</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wei, Moshi</au><au>Huang, Yuchao</au><au>Yang, Jinqiu</au><au>Wang, Junjie</au><au>Wang, Song</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>CoCoFuzzing: Testing Neural Code Models with Coverage-Guided Fuzzing</atitle><date>2021-06-17</date><risdate>2021</risdate><abstract>Deep learning-based code processing models have shown good performance for
tasks such as predicting method names, summarizing programs, and comment
generation. However, despite the tremendous progress, deep learning models are
often prone to adversarial attacks, which can significantly threaten the
robustness and generalizability of these models by leading them to
misclassification with unexpected inputs. To address the above issue, many deep
learning testing approaches have been proposed, however, these approaches
mainly focus on testing deep learning applications in the domains of image,
audio, and text analysis, etc., which cannot be directly applied to neural
models for code due to the unique properties of programs. In this paper, we
propose a coverage-based fuzzing framework, CoCoFuzzing, for testing deep
learning-based code processing models. In particular, we first propose ten
mutation operators to automatically generate valid and semantically preserving
source code examples as tests; then we propose a neuron coverage-based approach
to guide the generation of tests. We investigate the performance of CoCoFuzzing
on three state-of-the-art neural code models, i.e., NeuralCodeSum, CODE2SEQ,
and CODE2VEC. Our experiment results demonstrate that CoCoFuzzing can generate
valid and semantically preserving source code examples for testing the
robustness and generalizability of these models and improve the neuron
coverage. Moreover, these tests can be used to improve the performance of the
target neural code models through adversarial retraining.</abstract><doi>10.48550/arxiv.2106.09242</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2106.09242 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2106_09242 |
source | arXiv.org |
subjects | Computer Science - Software Engineering |
title | CoCoFuzzing: Testing Neural Code Models with Coverage-Guided Fuzzing |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T09%3A16%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=CoCoFuzzing:%20Testing%20Neural%20Code%20Models%20with%20Coverage-Guided%20Fuzzing&rft.au=Wei,%20Moshi&rft.date=2021-06-17&rft_id=info:doi/10.48550/arxiv.2106.09242&rft_dat=%3Carxiv_GOX%3E2106_09242%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |