CoCoFuzzing: Testing Neural Code Models with Coverage-Guided Fuzzing

Deep learning-based code processing models have shown good performance for tasks such as predicting method names, summarizing programs, and comment generation. However, despite the tremendous progress, deep learning models are often prone to adversarial attacks, which can significantly threaten the...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Wei, Moshi, Huang, Yuchao, Yang, Jinqiu, Wang, Junjie, Wang, Song
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Software Engineering
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Wei, Moshi Huang, Yuchao Yang, Jinqiu Wang, Junjie Wang, Song
description	Deep learning-based code processing models have shown good performance for tasks such as predicting method names, summarizing programs, and comment generation. However, despite the tremendous progress, deep learning models are often prone to adversarial attacks, which can significantly threaten the robustness and generalizability of these models by leading them to misclassification with unexpected inputs. To address the above issue, many deep learning testing approaches have been proposed, however, these approaches mainly focus on testing deep learning applications in the domains of image, audio, and text analysis, etc., which cannot be directly applied to neural models for code due to the unique properties of programs. In this paper, we propose a coverage-based fuzzing framework, CoCoFuzzing, for testing deep learning-based code processing models. In particular, we first propose ten mutation operators to automatically generate valid and semantically preserving source code examples as tests; then we propose a neuron coverage-based approach to guide the generation of tests. We investigate the performance of CoCoFuzzing on three state-of-the-art neural code models, i.e., NeuralCodeSum, CODE2SEQ, and CODE2VEC. Our experiment results demonstrate that CoCoFuzzing can generate valid and semantically preserving source code examples for testing the robustness and generalizability of these models and improve the neuron coverage. Moreover, these tests can be used to improve the performance of the target neural code models through adversarial retraining.
doi_str_mv	10.48550/arxiv.2106.09242
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2106_09242</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2106_09242</sourcerecordid><originalsourceid>FETCH-LOGICAL-a672-726a26b7657646f4d7f43e948c168e97b24211297280fb903ff92e5940d846853</originalsourceid><addsrcrecordid>eNotj8FOwzAQRH3pARU-gBP-gQTbsdc2N2RoQWrLJffIwetiKSXIaQr06wltLzOjkWa1j5BbzkpplGL3Pv-kQyk4g5JZIcUVeXK96xfj8Zg-tw-0xmE_BbrBMfuOuj4gXU_SDfQ77T-m4oDZb7FYjilgoJfhNZlF3w14c_E5qRfPtXspVm_LV_e4KjxoUWgBXkCrQWmQEGXQUVZopXnnYNDqdnqIc2G1MCy2llUxWoHKShaMBKOqObk7nz1hNF857Xz-bf5xmhNO9QeQ3UMh</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>CoCoFuzzing: Testing Neural Code Models with Coverage-Guided Fuzzing</title><source>arXiv.org</source><creator>Wei, Moshi ; Huang, Yuchao ; Yang, Jinqiu ; Wang, Junjie ; Wang, Song</creator><creatorcontrib>Wei, Moshi ; Huang, Yuchao ; Yang, Jinqiu ; Wang, Junjie ; Wang, Song</creatorcontrib><description>Deep learning-based code processing models have shown good performance for tasks such as predicting method names, summarizing programs, and comment generation. However, despite the tremendous progress, deep learning models are often prone to adversarial attacks, which can significantly threaten the robustness and generalizability of these models by leading them to misclassification with unexpected inputs. To address the above issue, many deep learning testing approaches have been proposed, however, these approaches mainly focus on testing deep learning applications in the domains of image, audio, and text analysis, etc., which cannot be directly applied to neural models for code due to the unique properties of programs. In this paper, we propose a coverage-based fuzzing framework, CoCoFuzzing, for testing deep learning-based code processing models. In particular, we first propose ten mutation operators to automatically generate valid and semantically preserving source code examples as tests; then we propose a neuron coverage-based approach to guide the generation of tests. We investigate the performance of CoCoFuzzing on three state-of-the-art neural code models, i.e., NeuralCodeSum, CODE2SEQ, and CODE2VEC. Our experiment results demonstrate that CoCoFuzzing can generate valid and semantically preserving source code examples for testing the robustness and generalizability of these models and improve the neuron coverage. Moreover, these tests can be used to improve the performance of the target neural code models through adversarial retraining.</description><identifier>DOI: 10.48550/arxiv.2106.09242</identifier><language>eng</language><subject>Computer Science - Software Engineering</subject><creationdate>2021-06</creationdate><rights>http://creativecommons.org/licenses/by-sa/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2106.09242$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2106.09242$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Wei, Moshi</creatorcontrib><creatorcontrib>Huang, Yuchao</creatorcontrib><creatorcontrib>Yang, Jinqiu</creatorcontrib><creatorcontrib>Wang, Junjie</creatorcontrib><creatorcontrib>Wang, Song</creatorcontrib><title>CoCoFuzzing: Testing Neural Code Models with Coverage-Guided Fuzzing</title><description>Deep learning-based code processing models have shown good performance for tasks such as predicting method names, summarizing programs, and comment generation. However, despite the tremendous progress, deep learning models are often prone to adversarial attacks, which can significantly threaten the robustness and generalizability of these models by leading them to misclassification with unexpected inputs. To address the above issue, many deep learning testing approaches have been proposed, however, these approaches mainly focus on testing deep learning applications in the domains of image, audio, and text analysis, etc., which cannot be directly applied to neural models for code due to the unique properties of programs. In this paper, we propose a coverage-based fuzzing framework, CoCoFuzzing, for testing deep learning-based code processing models. In particular, we first propose ten mutation operators to automatically generate valid and semantically preserving source code examples as tests; then we propose a neuron coverage-based approach to guide the generation of tests. We investigate the performance of CoCoFuzzing on three state-of-the-art neural code models, i.e., NeuralCodeSum, CODE2SEQ, and CODE2VEC. Our experiment results demonstrate that CoCoFuzzing can generate valid and semantically preserving source code examples for testing the robustness and generalizability of these models and improve the neuron coverage. Moreover, these tests can be used to improve the performance of the target neural code models through adversarial retraining.</description><subject>Computer Science - Software Engineering</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8FOwzAQRH3pARU-gBP-gQTbsdc2N2RoQWrLJffIwetiKSXIaQr06wltLzOjkWa1j5BbzkpplGL3Pv-kQyk4g5JZIcUVeXK96xfj8Zg-tw-0xmE_BbrBMfuOuj4gXU_SDfQ77T-m4oDZb7FYjilgoJfhNZlF3w14c_E5qRfPtXspVm_LV_e4KjxoUWgBXkCrQWmQEGXQUVZopXnnYNDqdnqIc2G1MCy2llUxWoHKShaMBKOqObk7nz1hNF857Xz-bf5xmhNO9QeQ3UMh</recordid><startdate>20210617</startdate><enddate>20210617</enddate><creator>Wei, Moshi</creator><creator>Huang, Yuchao</creator><creator>Yang, Jinqiu</creator><creator>Wang, Junjie</creator><creator>Wang, Song</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20210617</creationdate><title>CoCoFuzzing: Testing Neural Code Models with Coverage-Guided Fuzzing</title><author>Wei, Moshi ; Huang, Yuchao ; Yang, Jinqiu ; Wang, Junjie ; Wang, Song</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a672-726a26b7657646f4d7f43e948c168e97b24211297280fb903ff92e5940d846853</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer Science - Software Engineering</topic><toplevel>online_resources</toplevel><creatorcontrib>Wei, Moshi</creatorcontrib><creatorcontrib>Huang, Yuchao</creatorcontrib><creatorcontrib>Yang, Jinqiu</creatorcontrib><creatorcontrib>Wang, Junjie</creatorcontrib><creatorcontrib>Wang, Song</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wei, Moshi</au><au>Huang, Yuchao</au><au>Yang, Jinqiu</au><au>Wang, Junjie</au><au>Wang, Song</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>CoCoFuzzing: Testing Neural Code Models with Coverage-Guided Fuzzing</atitle><date>2021-06-17</date><risdate>2021</risdate><abstract>Deep learning-based code processing models have shown good performance for tasks such as predicting method names, summarizing programs, and comment generation. However, despite the tremendous progress, deep learning models are often prone to adversarial attacks, which can significantly threaten the robustness and generalizability of these models by leading them to misclassification with unexpected inputs. To address the above issue, many deep learning testing approaches have been proposed, however, these approaches mainly focus on testing deep learning applications in the domains of image, audio, and text analysis, etc., which cannot be directly applied to neural models for code due to the unique properties of programs. In this paper, we propose a coverage-based fuzzing framework, CoCoFuzzing, for testing deep learning-based code processing models. In particular, we first propose ten mutation operators to automatically generate valid and semantically preserving source code examples as tests; then we propose a neuron coverage-based approach to guide the generation of tests. We investigate the performance of CoCoFuzzing on three state-of-the-art neural code models, i.e., NeuralCodeSum, CODE2SEQ, and CODE2VEC. Our experiment results demonstrate that CoCoFuzzing can generate valid and semantically preserving source code examples for testing the robustness and generalizability of these models and improve the neuron coverage. Moreover, these tests can be used to improve the performance of the target neural code models through adversarial retraining.</abstract><doi>10.48550/arxiv.2106.09242</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2106.09242
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2106_09242
source	arXiv.org
subjects	Computer Science - Software Engineering
title	CoCoFuzzing: Testing Neural Code Models with Coverage-Guided Fuzzing
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T09%3A16%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=CoCoFuzzing:%20Testing%20Neural%20Code%20Models%20with%20Coverage-Guided%20Fuzzing&rft.au=Wei,%20Moshi&rft.date=2021-06-17&rft_id=info:doi/10.48550/arxiv.2106.09242&rft_dat=%3Carxiv_GOX%3E2106_09242%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true