A Differential Testing Approach for Evaluating Abstract Syntax Tree Mapping Algorithms

International Conference on Software Engineering 2021 Abstract syntax tree (AST) mapping algorithms are widely used to analyze changes in source code. Despite the foundational role of AST mapping algorithms, little effort has been made to evaluate the accuracy of AST mapping algorithms, i.e., the ex...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Fan, Yuanrui, Xia, Xin, Lo, David, Hassan, Ahmed E, Wang, Yuan, Li, Shanping
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Fan, Yuanrui
Xia, Xin
Lo, David
Hassan, Ahmed E
Wang, Yuan
Li, Shanping
description International Conference on Software Engineering 2021 Abstract syntax tree (AST) mapping algorithms are widely used to analyze changes in source code. Despite the foundational role of AST mapping algorithms, little effort has been made to evaluate the accuracy of AST mapping algorithms, i.e., the extent to which an algorihtm captures the evolution of code. We observe that a program element often has only one best-mapped program element. Based on this observation, we propose a hierarchical approach to automatically compare the similarity of mapped statements and tokens by different algorithms. By performing the comparison, we determine if each of the compared algorithms generates inaccurate mappings for a statement or its tokens. We invite 12 external experts to determine if three commonly used AST mapping algorithms generate accurate mappings for a statement and its tokens for 200 statements. Based on the experts' feedback,we observe that our approach achieves a precision of 0.98--1.00 and a recall of 0.65--0.75. Furthermore, we conduct a large-scale study with a dataset of ten Java projects, containing a total of 263,165 file revisions. Our approach determines that GumTree, MTDiff and IJM generate inaccurate mappings for 20%--29%, 25%--36% and 21%--30% of the file revisions, respectively. Our experimental results show that state-of-art AST mapping agorithms still need improvements.
doi_str_mv 10.48550/arxiv.2103.00141
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2103_00141</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2103_00141</sourcerecordid><originalsourceid>FETCH-LOGICAL-a671-181e0aca94967cb650255b865aae31cfc8092e8b2e71e5898c29480de4d04a403</originalsourceid><addsrcrecordid>eNotz81qwzAQBGBdeihpH6Cn6gXsrmzJlo4mTX8gpYeaXs1aWSUCxxayGpK3L3V6GpiBgY-xBwG51ErBE8azP-WFgDIHEFLcsu-GP3vnKNKYPA68pTn5cc-bEOKE9sDdFPnmhMMPXvt-ThFt4l-XMeGZt5GIf2AIyzjsp-jT4TjfsRuHw0z3_7li7cumXb9l28_X93WzzbCqRSa0IECLRpqqtn2loFCq15VCpFJYZzWYgnRfUC1IaaNtYaSGHckdSJRQrtjj9XaBdSH6I8ZL9wfsFmD5Cxn8Szk</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>A Differential Testing Approach for Evaluating Abstract Syntax Tree Mapping Algorithms</title><source>arXiv.org</source><creator>Fan, Yuanrui ; Xia, Xin ; Lo, David ; Hassan, Ahmed E ; Wang, Yuan ; Li, Shanping</creator><creatorcontrib>Fan, Yuanrui ; Xia, Xin ; Lo, David ; Hassan, Ahmed E ; Wang, Yuan ; Li, Shanping</creatorcontrib><description>International Conference on Software Engineering 2021 Abstract syntax tree (AST) mapping algorithms are widely used to analyze changes in source code. Despite the foundational role of AST mapping algorithms, little effort has been made to evaluate the accuracy of AST mapping algorithms, i.e., the extent to which an algorihtm captures the evolution of code. We observe that a program element often has only one best-mapped program element. Based on this observation, we propose a hierarchical approach to automatically compare the similarity of mapped statements and tokens by different algorithms. By performing the comparison, we determine if each of the compared algorithms generates inaccurate mappings for a statement or its tokens. We invite 12 external experts to determine if three commonly used AST mapping algorithms generate accurate mappings for a statement and its tokens for 200 statements. Based on the experts' feedback,we observe that our approach achieves a precision of 0.98--1.00 and a recall of 0.65--0.75. Furthermore, we conduct a large-scale study with a dataset of ten Java projects, containing a total of 263,165 file revisions. Our approach determines that GumTree, MTDiff and IJM generate inaccurate mappings for 20%--29%, 25%--36% and 21%--30% of the file revisions, respectively. Our experimental results show that state-of-art AST mapping agorithms still need improvements.</description><identifier>DOI: 10.48550/arxiv.2103.00141</identifier><language>eng</language><subject>Computer Science - Software Engineering</subject><creationdate>2021-02</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,777,882</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2103.00141$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2103.00141$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Fan, Yuanrui</creatorcontrib><creatorcontrib>Xia, Xin</creatorcontrib><creatorcontrib>Lo, David</creatorcontrib><creatorcontrib>Hassan, Ahmed E</creatorcontrib><creatorcontrib>Wang, Yuan</creatorcontrib><creatorcontrib>Li, Shanping</creatorcontrib><title>A Differential Testing Approach for Evaluating Abstract Syntax Tree Mapping Algorithms</title><description>International Conference on Software Engineering 2021 Abstract syntax tree (AST) mapping algorithms are widely used to analyze changes in source code. Despite the foundational role of AST mapping algorithms, little effort has been made to evaluate the accuracy of AST mapping algorithms, i.e., the extent to which an algorihtm captures the evolution of code. We observe that a program element often has only one best-mapped program element. Based on this observation, we propose a hierarchical approach to automatically compare the similarity of mapped statements and tokens by different algorithms. By performing the comparison, we determine if each of the compared algorithms generates inaccurate mappings for a statement or its tokens. We invite 12 external experts to determine if three commonly used AST mapping algorithms generate accurate mappings for a statement and its tokens for 200 statements. Based on the experts' feedback,we observe that our approach achieves a precision of 0.98--1.00 and a recall of 0.65--0.75. Furthermore, we conduct a large-scale study with a dataset of ten Java projects, containing a total of 263,165 file revisions. Our approach determines that GumTree, MTDiff and IJM generate inaccurate mappings for 20%--29%, 25%--36% and 21%--30% of the file revisions, respectively. Our experimental results show that state-of-art AST mapping agorithms still need improvements.</description><subject>Computer Science - Software Engineering</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz81qwzAQBGBdeihpH6Cn6gXsrmzJlo4mTX8gpYeaXs1aWSUCxxayGpK3L3V6GpiBgY-xBwG51ErBE8azP-WFgDIHEFLcsu-GP3vnKNKYPA68pTn5cc-bEOKE9sDdFPnmhMMPXvt-ThFt4l-XMeGZt5GIf2AIyzjsp-jT4TjfsRuHw0z3_7li7cumXb9l28_X93WzzbCqRSa0IECLRpqqtn2loFCq15VCpFJYZzWYgnRfUC1IaaNtYaSGHckdSJRQrtjj9XaBdSH6I8ZL9wfsFmD5Cxn8Szk</recordid><startdate>20210227</startdate><enddate>20210227</enddate><creator>Fan, Yuanrui</creator><creator>Xia, Xin</creator><creator>Lo, David</creator><creator>Hassan, Ahmed E</creator><creator>Wang, Yuan</creator><creator>Li, Shanping</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20210227</creationdate><title>A Differential Testing Approach for Evaluating Abstract Syntax Tree Mapping Algorithms</title><author>Fan, Yuanrui ; Xia, Xin ; Lo, David ; Hassan, Ahmed E ; Wang, Yuan ; Li, Shanping</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a671-181e0aca94967cb650255b865aae31cfc8092e8b2e71e5898c29480de4d04a403</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer Science - Software Engineering</topic><toplevel>online_resources</toplevel><creatorcontrib>Fan, Yuanrui</creatorcontrib><creatorcontrib>Xia, Xin</creatorcontrib><creatorcontrib>Lo, David</creatorcontrib><creatorcontrib>Hassan, Ahmed E</creatorcontrib><creatorcontrib>Wang, Yuan</creatorcontrib><creatorcontrib>Li, Shanping</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Fan, Yuanrui</au><au>Xia, Xin</au><au>Lo, David</au><au>Hassan, Ahmed E</au><au>Wang, Yuan</au><au>Li, Shanping</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Differential Testing Approach for Evaluating Abstract Syntax Tree Mapping Algorithms</atitle><date>2021-02-27</date><risdate>2021</risdate><abstract>International Conference on Software Engineering 2021 Abstract syntax tree (AST) mapping algorithms are widely used to analyze changes in source code. Despite the foundational role of AST mapping algorithms, little effort has been made to evaluate the accuracy of AST mapping algorithms, i.e., the extent to which an algorihtm captures the evolution of code. We observe that a program element often has only one best-mapped program element. Based on this observation, we propose a hierarchical approach to automatically compare the similarity of mapped statements and tokens by different algorithms. By performing the comparison, we determine if each of the compared algorithms generates inaccurate mappings for a statement or its tokens. We invite 12 external experts to determine if three commonly used AST mapping algorithms generate accurate mappings for a statement and its tokens for 200 statements. Based on the experts' feedback,we observe that our approach achieves a precision of 0.98--1.00 and a recall of 0.65--0.75. Furthermore, we conduct a large-scale study with a dataset of ten Java projects, containing a total of 263,165 file revisions. Our approach determines that GumTree, MTDiff and IJM generate inaccurate mappings for 20%--29%, 25%--36% and 21%--30% of the file revisions, respectively. Our experimental results show that state-of-art AST mapping agorithms still need improvements.</abstract><doi>10.48550/arxiv.2103.00141</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2103.00141
ispartof
issn
language eng
recordid cdi_arxiv_primary_2103_00141
source arXiv.org
subjects Computer Science - Software Engineering
title A Differential Testing Approach for Evaluating Abstract Syntax Tree Mapping Algorithms
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T08%3A43%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Differential%20Testing%20Approach%20for%20Evaluating%20Abstract%20Syntax%20Tree%20Mapping%20Algorithms&rft.au=Fan,%20Yuanrui&rft.date=2021-02-27&rft_id=info:doi/10.48550/arxiv.2103.00141&rft_dat=%3Carxiv_GOX%3E2103_00141%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true