Invalidator: Automated Patch Correctness Assessment via Semantic and Syntactic Reasoning

Automated program repair (APR) has been gaining ground recently. However, a significant challenge that still remains is test overfitting, in which APR-generated patches plausibly pass the validation test suite but fail to generalize. A common practice to assess the correctness of APR-generated patch...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on software engineering 2023-06, Vol.49 (6), p.1-20
Hauptverfasser:	Le-Cong, Thanh, Luong, Duc-Minh, Le, Xuan Bach D., Lo, David, Tran, Nhat-Hoa, Quang-Huy, Bui, Huynh, Quyet-Thang
Format:	Artikel
Sprache:	eng
Schlagworte:	Automated Patch Correctness Assessment Automated Program Repair Automation Code Representations Inspection Invariants Overfitting problem Patches (structures) Program Invariants Reasoning Semantics Software testing Syntax
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	20
container_issue	6
container_start_page	1
container_title	IEEE transactions on software engineering
container_volume	49
creator	Le-Cong, Thanh Luong, Duc-Minh Le, Xuan Bach D. Lo, David Tran, Nhat-Hoa Quang-Huy, Bui Huynh, Quyet-Thang
description	Automated program repair (APR) has been gaining ground recently. However, a significant challenge that still remains is test overfitting, in which APR-generated patches plausibly pass the validation test suite but fail to generalize. A common practice to assess the correctness of APR-generated patches is to judge whether they are equivalent to ground truth, i.e., developer-written patches, by either generating additional test cases or employing human manual inspections. The former often requires the generation of at least one test that shows behavioral differences between the APR-patched and developer-patched programs. Searching for this test, however, can be difficult as the search space can be enormous. Meanwhile, the latter is prone to human biases and requires repetitive and expensive manual effort. In this paper, we propose a novel technique, Invalidator , to automatically assess the correctness of APR-generated patches via semantic and syntactic reasoning. Invalidator leverages program invariants to reason about program semantics while also capturing program syntax through language semantics learned from a large code corpus using a pre-trained language model. Given a buggy program and the developer-patched program, Invalidator infers likely invariants on both programs. Then, Invalidator determines that an APR-generated patch overfits if: (1) it violates correct specifications or (2) maintains erroneous behaviors from the original buggy program. In case our approach fails to determine an overfitting patch based on invariants, Invalidator utilizes a trained model from labeled patches to assess patch correctness based on program syntax. The benefit of Invalidator is threefold. First, Invalidator leverages both semantic and syntactic reasoning to enhance its discriminative capability. Second, Invalidator does not require new test cases to be generated, but instead only relies on the current test suite and uses invariant inference to generalize program behaviors. Third, Invalidator is fully automated. We conducted our experiments on a dataset of 885 patches generated on real-world programs in Defects4J. Experiment results show that Invalidator correctly classified 79% of overfitting patches, accounting for 23% more overfitting patches being detected than the best baseline. Invalidator also substantially outperforms the best baselines by 14% and 19% in terms of Accuracy and F-Measure, respectively.
doi_str_mv	10.1109/TSE.2023.3255177
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TSE_2023_3255177</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10066209</ieee_id><sourcerecordid>2825599503</sourcerecordid><originalsourceid>FETCH-LOGICAL-c292t-194ada8f6636138cfaaead0b9689c13ed79d374efb304e99e93d340146d6e1ab3</originalsourceid><addsrcrecordid>eNpNkMFLwzAUxoMoOKd3Dx4KnjtfkqZtvI0xdTBQ3ARv5S151Y41nUk22H9vxzx4-njw-74HP8ZuOYw4B_2wXExHAoQcSaEUL4ozNuBa6lQqAedsAKDLVKlSX7KrENYAoIpCDdjnzO1x01iMnX9MxrvYtRjJJm8YzXcy6bwnEx2FkIxD6KMlF5N9g8mCWnSxMQk6mywOLqI5Xu-EoXON-7pmFzVuAt385ZB9PE2Xk5d0_vo8m4znqRFaxJTrDC2WdZ7LnMvS1IiEFlY6L7XhkmyhrSwyqlcSMtKatLQyA57lNieOKzlk96fdre9-dhRite523vUvK1H2KrRWIHsKTpTxXQie6mrrmxb9oeJQHf1Vvb_q6K_689dX7k6Vhoj-4ZDnArT8BWAkbNE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2825599503</pqid></control><display><type>article</type><title>Invalidator: Automated Patch Correctness Assessment via Semantic and Syntactic Reasoning</title><source>IEEE Electronic Library (IEL)</source><creator>Le-Cong, Thanh ; Luong, Duc-Minh ; Le, Xuan Bach D. ; Lo, David ; Tran, Nhat-Hoa ; Quang-Huy, Bui ; Huynh, Quyet-Thang</creator><creatorcontrib>Le-Cong, Thanh ; Luong, Duc-Minh ; Le, Xuan Bach D. ; Lo, David ; Tran, Nhat-Hoa ; Quang-Huy, Bui ; Huynh, Quyet-Thang</creatorcontrib><description>Automated program repair (APR) has been gaining ground recently. However, a significant challenge that still remains is test overfitting, in which APR-generated patches plausibly pass the validation test suite but fail to generalize. A common practice to assess the correctness of APR-generated patches is to judge whether they are equivalent to ground truth, i.e., developer-written patches, by either generating additional test cases or employing human manual inspections. The former often requires the generation of at least one test that shows behavioral differences between the APR-patched and developer-patched programs. Searching for this test, however, can be difficult as the search space can be enormous. Meanwhile, the latter is prone to human biases and requires repetitive and expensive manual effort. In this paper, we propose a novel technique, Invalidator , to automatically assess the correctness of APR-generated patches via semantic and syntactic reasoning. Invalidator leverages program invariants to reason about program semantics while also capturing program syntax through language semantics learned from a large code corpus using a pre-trained language model. Given a buggy program and the developer-patched program, Invalidator infers likely invariants on both programs. Then, Invalidator determines that an APR-generated patch overfits if: (1) it violates correct specifications or (2) maintains erroneous behaviors from the original buggy program. In case our approach fails to determine an overfitting patch based on invariants, Invalidator utilizes a trained model from labeled patches to assess patch correctness based on program syntax. The benefit of Invalidator is threefold. First, Invalidator leverages both semantic and syntactic reasoning to enhance its discriminative capability. Second, Invalidator does not require new test cases to be generated, but instead only relies on the current test suite and uses invariant inference to generalize program behaviors. Third, Invalidator is fully automated. We conducted our experiments on a dataset of 885 patches generated on real-world programs in Defects4J. Experiment results show that Invalidator correctly classified 79% of overfitting patches, accounting for 23% more overfitting patches being detected than the best baseline. Invalidator also substantially outperforms the best baselines by 14% and 19% in terms of Accuracy and F-Measure, respectively.</description><identifier>ISSN: 0098-5589</identifier><identifier>EISSN: 1939-3520</identifier><identifier>DOI: 10.1109/TSE.2023.3255177</identifier><identifier>CODEN: IESEDJ</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Automated Patch Correctness Assessment ; Automated Program Repair ; Automation ; Code Representations ; Inspection ; Invariants ; Overfitting problem ; Patches (structures) ; Program Invariants ; Reasoning ; Semantics ; Software testing ; Syntax</subject><ispartof>IEEE transactions on software engineering, 2023-06, Vol.49 (6), p.1-20</ispartof><rights>Copyright IEEE Computer Society 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c292t-194ada8f6636138cfaaead0b9689c13ed79d374efb304e99e93d340146d6e1ab3</citedby><cites>FETCH-LOGICAL-c292t-194ada8f6636138cfaaead0b9689c13ed79d374efb304e99e93d340146d6e1ab3</cites><orcidid>0000-0002-4367-7201 ; 0000-0001-5044-1582 ; 0000-0002-9566-324X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10066209$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10066209$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Le-Cong, Thanh</creatorcontrib><creatorcontrib>Luong, Duc-Minh</creatorcontrib><creatorcontrib>Le, Xuan Bach D.</creatorcontrib><creatorcontrib>Lo, David</creatorcontrib><creatorcontrib>Tran, Nhat-Hoa</creatorcontrib><creatorcontrib>Quang-Huy, Bui</creatorcontrib><creatorcontrib>Huynh, Quyet-Thang</creatorcontrib><title>Invalidator: Automated Patch Correctness Assessment via Semantic and Syntactic Reasoning</title><title>IEEE transactions on software engineering</title><addtitle>TSE</addtitle><description>Automated program repair (APR) has been gaining ground recently. However, a significant challenge that still remains is test overfitting, in which APR-generated patches plausibly pass the validation test suite but fail to generalize. A common practice to assess the correctness of APR-generated patches is to judge whether they are equivalent to ground truth, i.e., developer-written patches, by either generating additional test cases or employing human manual inspections. The former often requires the generation of at least one test that shows behavioral differences between the APR-patched and developer-patched programs. Searching for this test, however, can be difficult as the search space can be enormous. Meanwhile, the latter is prone to human biases and requires repetitive and expensive manual effort. In this paper, we propose a novel technique, Invalidator , to automatically assess the correctness of APR-generated patches via semantic and syntactic reasoning. Invalidator leverages program invariants to reason about program semantics while also capturing program syntax through language semantics learned from a large code corpus using a pre-trained language model. Given a buggy program and the developer-patched program, Invalidator infers likely invariants on both programs. Then, Invalidator determines that an APR-generated patch overfits if: (1) it violates correct specifications or (2) maintains erroneous behaviors from the original buggy program. In case our approach fails to determine an overfitting patch based on invariants, Invalidator utilizes a trained model from labeled patches to assess patch correctness based on program syntax. The benefit of Invalidator is threefold. First, Invalidator leverages both semantic and syntactic reasoning to enhance its discriminative capability. Second, Invalidator does not require new test cases to be generated, but instead only relies on the current test suite and uses invariant inference to generalize program behaviors. Third, Invalidator is fully automated. We conducted our experiments on a dataset of 885 patches generated on real-world programs in Defects4J. Experiment results show that Invalidator correctly classified 79% of overfitting patches, accounting for 23% more overfitting patches being detected than the best baseline. Invalidator also substantially outperforms the best baselines by 14% and 19% in terms of Accuracy and F-Measure, respectively.</description><subject>Automated Patch Correctness Assessment</subject><subject>Automated Program Repair</subject><subject>Automation</subject><subject>Code Representations</subject><subject>Inspection</subject><subject>Invariants</subject><subject>Overfitting problem</subject><subject>Patches (structures)</subject><subject>Program Invariants</subject><subject>Reasoning</subject><subject>Semantics</subject><subject>Software testing</subject><subject>Syntax</subject><issn>0098-5589</issn><issn>1939-3520</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkMFLwzAUxoMoOKd3Dx4KnjtfkqZtvI0xdTBQ3ARv5S151Y41nUk22H9vxzx4-njw-74HP8ZuOYw4B_2wXExHAoQcSaEUL4ozNuBa6lQqAedsAKDLVKlSX7KrENYAoIpCDdjnzO1x01iMnX9MxrvYtRjJJm8YzXcy6bwnEx2FkIxD6KMlF5N9g8mCWnSxMQk6mywOLqI5Xu-EoXON-7pmFzVuAt385ZB9PE2Xk5d0_vo8m4znqRFaxJTrDC2WdZ7LnMvS1IiEFlY6L7XhkmyhrSwyqlcSMtKatLQyA57lNieOKzlk96fdre9-dhRite523vUvK1H2KrRWIHsKTpTxXQie6mrrmxb9oeJQHf1Vvb_q6K_689dX7k6Vhoj-4ZDnArT8BWAkbNE</recordid><startdate>20230601</startdate><enddate>20230601</enddate><creator>Le-Cong, Thanh</creator><creator>Luong, Duc-Minh</creator><creator>Le, Xuan Bach D.</creator><creator>Lo, David</creator><creator>Tran, Nhat-Hoa</creator><creator>Quang-Huy, Bui</creator><creator>Huynh, Quyet-Thang</creator><general>IEEE</general><general>IEEE Computer Society</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>JQ2</scope><scope>K9.</scope><orcidid>https://orcid.org/0000-0002-4367-7201</orcidid><orcidid>https://orcid.org/0000-0001-5044-1582</orcidid><orcidid>https://orcid.org/0000-0002-9566-324X</orcidid></search><sort><creationdate>20230601</creationdate><title>Invalidator: Automated Patch Correctness Assessment via Semantic and Syntactic Reasoning</title><author>Le-Cong, Thanh ; Luong, Duc-Minh ; Le, Xuan Bach D. ; Lo, David ; Tran, Nhat-Hoa ; Quang-Huy, Bui ; Huynh, Quyet-Thang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c292t-194ada8f6636138cfaaead0b9689c13ed79d374efb304e99e93d340146d6e1ab3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Automated Patch Correctness Assessment</topic><topic>Automated Program Repair</topic><topic>Automation</topic><topic>Code Representations</topic><topic>Inspection</topic><topic>Invariants</topic><topic>Overfitting problem</topic><topic>Patches (structures)</topic><topic>Program Invariants</topic><topic>Reasoning</topic><topic>Semantics</topic><topic>Software testing</topic><topic>Syntax</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Le-Cong, Thanh</creatorcontrib><creatorcontrib>Luong, Duc-Minh</creatorcontrib><creatorcontrib>Le, Xuan Bach D.</creatorcontrib><creatorcontrib>Lo, David</creatorcontrib><creatorcontrib>Tran, Nhat-Hoa</creatorcontrib><creatorcontrib>Quang-Huy, Bui</creatorcontrib><creatorcontrib>Huynh, Quyet-Thang</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><jtitle>IEEE transactions on software engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Le-Cong, Thanh</au><au>Luong, Duc-Minh</au><au>Le, Xuan Bach D.</au><au>Lo, David</au><au>Tran, Nhat-Hoa</au><au>Quang-Huy, Bui</au><au>Huynh, Quyet-Thang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Invalidator: Automated Patch Correctness Assessment via Semantic and Syntactic Reasoning</atitle><jtitle>IEEE transactions on software engineering</jtitle><stitle>TSE</stitle><date>2023-06-01</date><risdate>2023</risdate><volume>49</volume><issue>6</issue><spage>1</spage><epage>20</epage><pages>1-20</pages><issn>0098-5589</issn><eissn>1939-3520</eissn><coden>IESEDJ</coden><abstract>Automated program repair (APR) has been gaining ground recently. However, a significant challenge that still remains is test overfitting, in which APR-generated patches plausibly pass the validation test suite but fail to generalize. A common practice to assess the correctness of APR-generated patches is to judge whether they are equivalent to ground truth, i.e., developer-written patches, by either generating additional test cases or employing human manual inspections. The former often requires the generation of at least one test that shows behavioral differences between the APR-patched and developer-patched programs. Searching for this test, however, can be difficult as the search space can be enormous. Meanwhile, the latter is prone to human biases and requires repetitive and expensive manual effort. In this paper, we propose a novel technique, Invalidator , to automatically assess the correctness of APR-generated patches via semantic and syntactic reasoning. Invalidator leverages program invariants to reason about program semantics while also capturing program syntax through language semantics learned from a large code corpus using a pre-trained language model. Given a buggy program and the developer-patched program, Invalidator infers likely invariants on both programs. Then, Invalidator determines that an APR-generated patch overfits if: (1) it violates correct specifications or (2) maintains erroneous behaviors from the original buggy program. In case our approach fails to determine an overfitting patch based on invariants, Invalidator utilizes a trained model from labeled patches to assess patch correctness based on program syntax. The benefit of Invalidator is threefold. First, Invalidator leverages both semantic and syntactic reasoning to enhance its discriminative capability. Second, Invalidator does not require new test cases to be generated, but instead only relies on the current test suite and uses invariant inference to generalize program behaviors. Third, Invalidator is fully automated. We conducted our experiments on a dataset of 885 patches generated on real-world programs in Defects4J. Experiment results show that Invalidator correctly classified 79% of overfitting patches, accounting for 23% more overfitting patches being detected than the best baseline. Invalidator also substantially outperforms the best baselines by 14% and 19% in terms of Accuracy and F-Measure, respectively.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TSE.2023.3255177</doi><tpages>20</tpages><orcidid>https://orcid.org/0000-0002-4367-7201</orcidid><orcidid>https://orcid.org/0000-0001-5044-1582</orcidid><orcidid>https://orcid.org/0000-0002-9566-324X</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 0098-5589
ispartof	IEEE transactions on software engineering, 2023-06, Vol.49 (6), p.1-20
issn	0098-5589 1939-3520
language	eng
recordid	cdi_crossref_primary_10_1109_TSE_2023_3255177
source	IEEE Electronic Library (IEL)
subjects	Automated Patch Correctness Assessment Automated Program Repair Automation Code Representations Inspection Invariants Overfitting problem Patches (structures) Program Invariants Reasoning Semantics Software testing Syntax
title	Invalidator: Automated Patch Correctness Assessment via Semantic and Syntactic Reasoning
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T13%3A47%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Invalidator:%20Automated%20Patch%20Correctness%20Assessment%20via%20Semantic%20and%20Syntactic%20Reasoning&rft.jtitle=IEEE%20transactions%20on%20software%20engineering&rft.au=Le-Cong,%20Thanh&rft.date=2023-06-01&rft.volume=49&rft.issue=6&rft.spage=1&rft.epage=20&rft.pages=1-20&rft.issn=0098-5589&rft.eissn=1939-3520&rft.coden=IESEDJ&rft_id=info:doi/10.1109/TSE.2023.3255177&rft_dat=%3Cproquest_RIE%3E2825599503%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2825599503&rft_id=info:pmid/&rft_ieee_id=10066209&rfr_iscdi=true