A Differential Testing Approach for Evaluating Abstract Syntax Tree Mapping Algorithms
International Conference on Software Engineering 2021 Abstract syntax tree (AST) mapping algorithms are widely used to analyze changes in source code. Despite the foundational role of AST mapping algorithms, little effort has been made to evaluate the accuracy of AST mapping algorithms, i.e., the ex...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Fan, Yuanrui Xia, Xin Lo, David Hassan, Ahmed E Wang, Yuan Li, Shanping |
description | International Conference on Software Engineering 2021 Abstract syntax tree (AST) mapping algorithms are widely used to analyze
changes in source code. Despite the foundational role of AST mapping
algorithms, little effort has been made to evaluate the accuracy of AST mapping
algorithms, i.e., the extent to which an algorihtm captures the evolution of
code. We observe that a program element often has only one best-mapped program
element. Based on this observation, we propose a hierarchical approach to
automatically compare the similarity of mapped statements and tokens by
different algorithms. By performing the comparison, we determine if each of the
compared algorithms generates inaccurate mappings for a statement or its
tokens. We invite 12 external experts to determine if three commonly used AST
mapping algorithms generate accurate mappings for a statement and its tokens
for 200 statements. Based on the experts' feedback,we observe that our approach
achieves a precision of 0.98--1.00 and a recall of 0.65--0.75. Furthermore, we
conduct a large-scale study with a dataset of ten Java projects, containing a
total of 263,165 file revisions. Our approach determines that GumTree, MTDiff
and IJM generate inaccurate mappings for 20%--29%, 25%--36% and 21%--30% of the
file revisions, respectively. Our experimental results show that state-of-art
AST mapping agorithms still need improvements. |
doi_str_mv | 10.48550/arxiv.2103.00141 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2103_00141</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2103_00141</sourcerecordid><originalsourceid>FETCH-LOGICAL-a671-181e0aca94967cb650255b865aae31cfc8092e8b2e71e5898c29480de4d04a403</originalsourceid><addsrcrecordid>eNotz81qwzAQBGBdeihpH6Cn6gXsrmzJlo4mTX8gpYeaXs1aWSUCxxayGpK3L3V6GpiBgY-xBwG51ErBE8azP-WFgDIHEFLcsu-GP3vnKNKYPA68pTn5cc-bEOKE9sDdFPnmhMMPXvt-ThFt4l-XMeGZt5GIf2AIyzjsp-jT4TjfsRuHw0z3_7li7cumXb9l28_X93WzzbCqRSa0IECLRpqqtn2loFCq15VCpFJYZzWYgnRfUC1IaaNtYaSGHckdSJRQrtjj9XaBdSH6I8ZL9wfsFmD5Cxn8Szk</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>A Differential Testing Approach for Evaluating Abstract Syntax Tree Mapping Algorithms</title><source>arXiv.org</source><creator>Fan, Yuanrui ; Xia, Xin ; Lo, David ; Hassan, Ahmed E ; Wang, Yuan ; Li, Shanping</creator><creatorcontrib>Fan, Yuanrui ; Xia, Xin ; Lo, David ; Hassan, Ahmed E ; Wang, Yuan ; Li, Shanping</creatorcontrib><description>International Conference on Software Engineering 2021 Abstract syntax tree (AST) mapping algorithms are widely used to analyze
changes in source code. Despite the foundational role of AST mapping
algorithms, little effort has been made to evaluate the accuracy of AST mapping
algorithms, i.e., the extent to which an algorihtm captures the evolution of
code. We observe that a program element often has only one best-mapped program
element. Based on this observation, we propose a hierarchical approach to
automatically compare the similarity of mapped statements and tokens by
different algorithms. By performing the comparison, we determine if each of the
compared algorithms generates inaccurate mappings for a statement or its
tokens. We invite 12 external experts to determine if three commonly used AST
mapping algorithms generate accurate mappings for a statement and its tokens
for 200 statements. Based on the experts' feedback,we observe that our approach
achieves a precision of 0.98--1.00 and a recall of 0.65--0.75. Furthermore, we
conduct a large-scale study with a dataset of ten Java projects, containing a
total of 263,165 file revisions. Our approach determines that GumTree, MTDiff
and IJM generate inaccurate mappings for 20%--29%, 25%--36% and 21%--30% of the
file revisions, respectively. Our experimental results show that state-of-art
AST mapping agorithms still need improvements.</description><identifier>DOI: 10.48550/arxiv.2103.00141</identifier><language>eng</language><subject>Computer Science - Software Engineering</subject><creationdate>2021-02</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,777,882</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2103.00141$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2103.00141$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Fan, Yuanrui</creatorcontrib><creatorcontrib>Xia, Xin</creatorcontrib><creatorcontrib>Lo, David</creatorcontrib><creatorcontrib>Hassan, Ahmed E</creatorcontrib><creatorcontrib>Wang, Yuan</creatorcontrib><creatorcontrib>Li, Shanping</creatorcontrib><title>A Differential Testing Approach for Evaluating Abstract Syntax Tree Mapping Algorithms</title><description>International Conference on Software Engineering 2021 Abstract syntax tree (AST) mapping algorithms are widely used to analyze
changes in source code. Despite the foundational role of AST mapping
algorithms, little effort has been made to evaluate the accuracy of AST mapping
algorithms, i.e., the extent to which an algorihtm captures the evolution of
code. We observe that a program element often has only one best-mapped program
element. Based on this observation, we propose a hierarchical approach to
automatically compare the similarity of mapped statements and tokens by
different algorithms. By performing the comparison, we determine if each of the
compared algorithms generates inaccurate mappings for a statement or its
tokens. We invite 12 external experts to determine if three commonly used AST
mapping algorithms generate accurate mappings for a statement and its tokens
for 200 statements. Based on the experts' feedback,we observe that our approach
achieves a precision of 0.98--1.00 and a recall of 0.65--0.75. Furthermore, we
conduct a large-scale study with a dataset of ten Java projects, containing a
total of 263,165 file revisions. Our approach determines that GumTree, MTDiff
and IJM generate inaccurate mappings for 20%--29%, 25%--36% and 21%--30% of the
file revisions, respectively. Our experimental results show that state-of-art
AST mapping agorithms still need improvements.</description><subject>Computer Science - Software Engineering</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz81qwzAQBGBdeihpH6Cn6gXsrmzJlo4mTX8gpYeaXs1aWSUCxxayGpK3L3V6GpiBgY-xBwG51ErBE8azP-WFgDIHEFLcsu-GP3vnKNKYPA68pTn5cc-bEOKE9sDdFPnmhMMPXvt-ThFt4l-XMeGZt5GIf2AIyzjsp-jT4TjfsRuHw0z3_7li7cumXb9l28_X93WzzbCqRSa0IECLRpqqtn2loFCq15VCpFJYZzWYgnRfUC1IaaNtYaSGHckdSJRQrtjj9XaBdSH6I8ZL9wfsFmD5Cxn8Szk</recordid><startdate>20210227</startdate><enddate>20210227</enddate><creator>Fan, Yuanrui</creator><creator>Xia, Xin</creator><creator>Lo, David</creator><creator>Hassan, Ahmed E</creator><creator>Wang, Yuan</creator><creator>Li, Shanping</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20210227</creationdate><title>A Differential Testing Approach for Evaluating Abstract Syntax Tree Mapping Algorithms</title><author>Fan, Yuanrui ; Xia, Xin ; Lo, David ; Hassan, Ahmed E ; Wang, Yuan ; Li, Shanping</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a671-181e0aca94967cb650255b865aae31cfc8092e8b2e71e5898c29480de4d04a403</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer Science - Software Engineering</topic><toplevel>online_resources</toplevel><creatorcontrib>Fan, Yuanrui</creatorcontrib><creatorcontrib>Xia, Xin</creatorcontrib><creatorcontrib>Lo, David</creatorcontrib><creatorcontrib>Hassan, Ahmed E</creatorcontrib><creatorcontrib>Wang, Yuan</creatorcontrib><creatorcontrib>Li, Shanping</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Fan, Yuanrui</au><au>Xia, Xin</au><au>Lo, David</au><au>Hassan, Ahmed E</au><au>Wang, Yuan</au><au>Li, Shanping</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Differential Testing Approach for Evaluating Abstract Syntax Tree Mapping Algorithms</atitle><date>2021-02-27</date><risdate>2021</risdate><abstract>International Conference on Software Engineering 2021 Abstract syntax tree (AST) mapping algorithms are widely used to analyze
changes in source code. Despite the foundational role of AST mapping
algorithms, little effort has been made to evaluate the accuracy of AST mapping
algorithms, i.e., the extent to which an algorihtm captures the evolution of
code. We observe that a program element often has only one best-mapped program
element. Based on this observation, we propose a hierarchical approach to
automatically compare the similarity of mapped statements and tokens by
different algorithms. By performing the comparison, we determine if each of the
compared algorithms generates inaccurate mappings for a statement or its
tokens. We invite 12 external experts to determine if three commonly used AST
mapping algorithms generate accurate mappings for a statement and its tokens
for 200 statements. Based on the experts' feedback,we observe that our approach
achieves a precision of 0.98--1.00 and a recall of 0.65--0.75. Furthermore, we
conduct a large-scale study with a dataset of ten Java projects, containing a
total of 263,165 file revisions. Our approach determines that GumTree, MTDiff
and IJM generate inaccurate mappings for 20%--29%, 25%--36% and 21%--30% of the
file revisions, respectively. Our experimental results show that state-of-art
AST mapping agorithms still need improvements.</abstract><doi>10.48550/arxiv.2103.00141</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2103.00141 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2103_00141 |
source | arXiv.org |
subjects | Computer Science - Software Engineering |
title | A Differential Testing Approach for Evaluating Abstract Syntax Tree Mapping Algorithms |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T08%3A43%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Differential%20Testing%20Approach%20for%20Evaluating%20Abstract%20Syntax%20Tree%20Mapping%20Algorithms&rft.au=Fan,%20Yuanrui&rft.date=2021-02-27&rft_id=info:doi/10.48550/arxiv.2103.00141&rft_dat=%3Carxiv_GOX%3E2103_00141%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |