Regression test prioritization leveraging source code similarity with tree kernels

Regression test prioritization (RTP) is an active research field, aiming at re‐ordering the tests in a test suite to maximize the rate at which faults are detected. A number of RTP strategies have been proposed, leveraging different factors to reorder tests. Some techniques include an analysis of ch...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of software : evolution and process 2024-08, Vol.36 (8), p.n/a
Hauptverfasser:	Altiero, Francesco, Corazza, Anna, Di Martino, Sergio, Peron, Adriano, Libero Lucio Starace, Luigi
Format:	Artikel
Sprache:	eng
Schlagworte:	Effectiveness Fault detection Faults Natural language processing regression testing Similarity Software testing Source code source code changes Structured data test prioritization tree kernels
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	n/a
container_issue	8
container_start_page
container_title	Journal of software : evolution and process
container_volume	36
creator	Altiero, Francesco Corazza, Anna Di Martino, Sergio Peron, Adriano Libero Lucio Starace, Luigi
description	Regression test prioritization (RTP) is an active research field, aiming at re‐ordering the tests in a test suite to maximize the rate at which faults are detected. A number of RTP strategies have been proposed, leveraging different factors to reorder tests. Some techniques include an analysis of changed source code, to assign higher priority to tests stressing modified parts of the codebase. Still, most of these change‐based solutions focus on simple text‐level comparisons among versions. We believe that measuring source code changes in a more refined way, capable of discriminating between mere textual changes (e.g., renaming of a local variable) and more structural changes (e.g., changes in the control flow), could lead to significant benefits in RTP, under the assumption that major structural changes are also more likely to introduce faults. To this end, we propose two novel RTP techniques that leverage tree kernels (TK), a class of similarity functions largely used in Natural Language Processing on tree‐structured data. In particular, we apply TKs to syntax trees of source code, to more precisely quantify the extent of structural changes in the source code, and prioritize tests accordingly. We assessed the effectiveness of the proposals by conducting an empirical study on five real‐world Java projects, also used in a number of RTP‐related papers. We automatically generated, for each considered pair of software versions (i.e., old version, new version) in the evolution of the involved projects, 100 variations with artificially injected faults, leading to over 5k different software evolution scenarios overall. We compared the proposed prioritization approaches against well‐known prioritization techniques, evaluating both their effectiveness and their execution times. Our findings show that leveraging more refined code change analysis techniques to quantify the extent of changes in source code can lead to relevant improvements in prioritization effectiveness, while typically introducing negligible overheads due to their execution. Not all changes to a codebase are equal: Some modifications (e.g., heavy refactoring) are more critical than others (e.g., renaming local variables). In this paper, we present two regression test prioritization techniques, namely, method‐level tree kernel prioritization (MTK) and method‐level tree kernel with quotient set (MTK‐QS), leveraging tree kernel functions to effectively measure the structural similarity of changed metho
doi_str_mv	10.1002/smr.2653
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3089037674</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3089037674</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2543-c0d31234e80dc4a1f0476ec2d18a10504e6b4d85d782e56a62988e13aab6a6f83</originalsourceid><addsrcrecordid>eNp1kE1Lw0AQhhdRsNSCP2HBi5fU_c72KMUvqAhRz8t2M4lb06Tuppb4691a8eZc5oOHmXdehM4pmVJC2FVchylTkh-hESMiz3Kh6fFfnfNTNIlxRVIoRqSQI1QUUAeI0Xct7iH2eBN8F3zvv2y_nzXwCcHWvq1x7LbBAXZdCTj6tW9s4ga88_0b7gMAfofQQhPP0EllmwiT3zxGr7c3L_P7bPF09zC_XmSOScEzR0pOGRegSemEpVUSqcCxkmpLiSQC1FKUWpa5ZiCVVWymNVBu7TI1leZjdHHYuwndxzZpN6uksE0nDSd6RniucpGoywPlQhdjgMqkD9c2DIYSszfNJNPM3rSEZgd05xsY_uXM82Pxw38D8q5uLg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3089037674</pqid></control><display><type>article</type><title>Regression test prioritization leveraging source code similarity with tree kernels</title><source>Access via Wiley Online Library</source><creator>Altiero, Francesco ; Corazza, Anna ; Di Martino, Sergio ; Peron, Adriano ; Libero Lucio Starace, Luigi</creator><creatorcontrib>Altiero, Francesco ; Corazza, Anna ; Di Martino, Sergio ; Peron, Adriano ; Libero Lucio Starace, Luigi</creatorcontrib><description>Regression test prioritization (RTP) is an active research field, aiming at re‐ordering the tests in a test suite to maximize the rate at which faults are detected. A number of RTP strategies have been proposed, leveraging different factors to reorder tests. Some techniques include an analysis of changed source code, to assign higher priority to tests stressing modified parts of the codebase. Still, most of these change‐based solutions focus on simple text‐level comparisons among versions. We believe that measuring source code changes in a more refined way, capable of discriminating between mere textual changes (e.g., renaming of a local variable) and more structural changes (e.g., changes in the control flow), could lead to significant benefits in RTP, under the assumption that major structural changes are also more likely to introduce faults. To this end, we propose two novel RTP techniques that leverage tree kernels (TK), a class of similarity functions largely used in Natural Language Processing on tree‐structured data. In particular, we apply TKs to syntax trees of source code, to more precisely quantify the extent of structural changes in the source code, and prioritize tests accordingly. We assessed the effectiveness of the proposals by conducting an empirical study on five real‐world Java projects, also used in a number of RTP‐related papers. We automatically generated, for each considered pair of software versions (i.e., old version, new version) in the evolution of the involved projects, 100 variations with artificially injected faults, leading to over 5k different software evolution scenarios overall. We compared the proposed prioritization approaches against well‐known prioritization techniques, evaluating both their effectiveness and their execution times. Our findings show that leveraging more refined code change analysis techniques to quantify the extent of changes in source code can lead to relevant improvements in prioritization effectiveness, while typically introducing negligible overheads due to their execution. Not all changes to a codebase are equal: Some modifications (e.g., heavy refactoring) are more critical than others (e.g., renaming local variables). In this paper, we present two regression test prioritization techniques, namely, method‐level tree kernel prioritization (MTK) and method‐level tree kernel with quotient set (MTK‐QS), leveraging tree kernel functions to effectively measure the structural similarity of changed methods and directing testing efforts towards code affected by more critical changes. Our experiments show that these techniques can significantly improve the fault detection rate than other traditional and widely used approaches.</description><identifier>ISSN: 2047-7473</identifier><identifier>EISSN: 2047-7481</identifier><identifier>DOI: 10.1002/smr.2653</identifier><language>eng</language><publisher>Chichester: Wiley Subscription Services, Inc</publisher><subject>Effectiveness ; Fault detection ; Faults ; Natural language processing ; regression testing ; Similarity ; Software testing ; Source code ; source code changes ; Structured data ; test prioritization ; tree kernels</subject><ispartof>Journal of software : evolution and process, 2024-08, Vol.36 (8), p.n/a</ispartof><rights>2024 John Wiley & Sons Ltd.</rights><rights>2024 John Wiley & Sons, Ltd.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c2543-c0d31234e80dc4a1f0476ec2d18a10504e6b4d85d782e56a62988e13aab6a6f83</cites><orcidid>0000-0001-7945-9014 ; 0000-0002-7111-3171 ; 0000-0001-7090-4249 ; 0000-0002-1019-9004 ; 0000-0002-9156-5079</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Fsmr.2653$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Fsmr.2653$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,780,784,1417,27924,27925,45574,45575</link.rule.ids></links><search><creatorcontrib>Altiero, Francesco</creatorcontrib><creatorcontrib>Corazza, Anna</creatorcontrib><creatorcontrib>Di Martino, Sergio</creatorcontrib><creatorcontrib>Peron, Adriano</creatorcontrib><creatorcontrib>Libero Lucio Starace, Luigi</creatorcontrib><title>Regression test prioritization leveraging source code similarity with tree kernels</title><title>Journal of software : evolution and process</title><description>Regression test prioritization (RTP) is an active research field, aiming at re‐ordering the tests in a test suite to maximize the rate at which faults are detected. A number of RTP strategies have been proposed, leveraging different factors to reorder tests. Some techniques include an analysis of changed source code, to assign higher priority to tests stressing modified parts of the codebase. Still, most of these change‐based solutions focus on simple text‐level comparisons among versions. We believe that measuring source code changes in a more refined way, capable of discriminating between mere textual changes (e.g., renaming of a local variable) and more structural changes (e.g., changes in the control flow), could lead to significant benefits in RTP, under the assumption that major structural changes are also more likely to introduce faults. To this end, we propose two novel RTP techniques that leverage tree kernels (TK), a class of similarity functions largely used in Natural Language Processing on tree‐structured data. In particular, we apply TKs to syntax trees of source code, to more precisely quantify the extent of structural changes in the source code, and prioritize tests accordingly. We assessed the effectiveness of the proposals by conducting an empirical study on five real‐world Java projects, also used in a number of RTP‐related papers. We automatically generated, for each considered pair of software versions (i.e., old version, new version) in the evolution of the involved projects, 100 variations with artificially injected faults, leading to over 5k different software evolution scenarios overall. We compared the proposed prioritization approaches against well‐known prioritization techniques, evaluating both their effectiveness and their execution times. Our findings show that leveraging more refined code change analysis techniques to quantify the extent of changes in source code can lead to relevant improvements in prioritization effectiveness, while typically introducing negligible overheads due to their execution. Not all changes to a codebase are equal: Some modifications (e.g., heavy refactoring) are more critical than others (e.g., renaming local variables). In this paper, we present two regression test prioritization techniques, namely, method‐level tree kernel prioritization (MTK) and method‐level tree kernel with quotient set (MTK‐QS), leveraging tree kernel functions to effectively measure the structural similarity of changed methods and directing testing efforts towards code affected by more critical changes. Our experiments show that these techniques can significantly improve the fault detection rate than other traditional and widely used approaches.</description><subject>Effectiveness</subject><subject>Fault detection</subject><subject>Faults</subject><subject>Natural language processing</subject><subject>regression testing</subject><subject>Similarity</subject><subject>Software testing</subject><subject>Source code</subject><subject>source code changes</subject><subject>Structured data</subject><subject>test prioritization</subject><subject>tree kernels</subject><issn>2047-7473</issn><issn>2047-7481</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp1kE1Lw0AQhhdRsNSCP2HBi5fU_c72KMUvqAhRz8t2M4lb06Tuppb4691a8eZc5oOHmXdehM4pmVJC2FVchylTkh-hESMiz3Kh6fFfnfNTNIlxRVIoRqSQI1QUUAeI0Xct7iH2eBN8F3zvv2y_nzXwCcHWvq1x7LbBAXZdCTj6tW9s4ga88_0b7gMAfofQQhPP0EllmwiT3zxGr7c3L_P7bPF09zC_XmSOScEzR0pOGRegSemEpVUSqcCxkmpLiSQC1FKUWpa5ZiCVVWymNVBu7TI1leZjdHHYuwndxzZpN6uksE0nDSd6RniucpGoywPlQhdjgMqkD9c2DIYSszfNJNPM3rSEZgd05xsY_uXM82Pxw38D8q5uLg</recordid><startdate>202408</startdate><enddate>202408</enddate><creator>Altiero, Francesco</creator><creator>Corazza, Anna</creator><creator>Di Martino, Sergio</creator><creator>Peron, Adriano</creator><creator>Libero Lucio Starace, Luigi</creator><general>Wiley Subscription Services, Inc</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-7945-9014</orcidid><orcidid>https://orcid.org/0000-0002-7111-3171</orcidid><orcidid>https://orcid.org/0000-0001-7090-4249</orcidid><orcidid>https://orcid.org/0000-0002-1019-9004</orcidid><orcidid>https://orcid.org/0000-0002-9156-5079</orcidid></search><sort><creationdate>202408</creationdate><title>Regression test prioritization leveraging source code similarity with tree kernels</title><author>Altiero, Francesco ; Corazza, Anna ; Di Martino, Sergio ; Peron, Adriano ; Libero Lucio Starace, Luigi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2543-c0d31234e80dc4a1f0476ec2d18a10504e6b4d85d782e56a62988e13aab6a6f83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Effectiveness</topic><topic>Fault detection</topic><topic>Faults</topic><topic>Natural language processing</topic><topic>regression testing</topic><topic>Similarity</topic><topic>Software testing</topic><topic>Source code</topic><topic>source code changes</topic><topic>Structured data</topic><topic>test prioritization</topic><topic>tree kernels</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Altiero, Francesco</creatorcontrib><creatorcontrib>Corazza, Anna</creatorcontrib><creatorcontrib>Di Martino, Sergio</creatorcontrib><creatorcontrib>Peron, Adriano</creatorcontrib><creatorcontrib>Libero Lucio Starace, Luigi</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Journal of software : evolution and process</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Altiero, Francesco</au><au>Corazza, Anna</au><au>Di Martino, Sergio</au><au>Peron, Adriano</au><au>Libero Lucio Starace, Luigi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Regression test prioritization leveraging source code similarity with tree kernels</atitle><jtitle>Journal of software : evolution and process</jtitle><date>2024-08</date><risdate>2024</risdate><volume>36</volume><issue>8</issue><epage>n/a</epage><issn>2047-7473</issn><eissn>2047-7481</eissn><abstract>Regression test prioritization (RTP) is an active research field, aiming at re‐ordering the tests in a test suite to maximize the rate at which faults are detected. A number of RTP strategies have been proposed, leveraging different factors to reorder tests. Some techniques include an analysis of changed source code, to assign higher priority to tests stressing modified parts of the codebase. Still, most of these change‐based solutions focus on simple text‐level comparisons among versions. We believe that measuring source code changes in a more refined way, capable of discriminating between mere textual changes (e.g., renaming of a local variable) and more structural changes (e.g., changes in the control flow), could lead to significant benefits in RTP, under the assumption that major structural changes are also more likely to introduce faults. To this end, we propose two novel RTP techniques that leverage tree kernels (TK), a class of similarity functions largely used in Natural Language Processing on tree‐structured data. In particular, we apply TKs to syntax trees of source code, to more precisely quantify the extent of structural changes in the source code, and prioritize tests accordingly. We assessed the effectiveness of the proposals by conducting an empirical study on five real‐world Java projects, also used in a number of RTP‐related papers. We automatically generated, for each considered pair of software versions (i.e., old version, new version) in the evolution of the involved projects, 100 variations with artificially injected faults, leading to over 5k different software evolution scenarios overall. We compared the proposed prioritization approaches against well‐known prioritization techniques, evaluating both their effectiveness and their execution times. Our findings show that leveraging more refined code change analysis techniques to quantify the extent of changes in source code can lead to relevant improvements in prioritization effectiveness, while typically introducing negligible overheads due to their execution. Not all changes to a codebase are equal: Some modifications (e.g., heavy refactoring) are more critical than others (e.g., renaming local variables). In this paper, we present two regression test prioritization techniques, namely, method‐level tree kernel prioritization (MTK) and method‐level tree kernel with quotient set (MTK‐QS), leveraging tree kernel functions to effectively measure the structural similarity of changed methods and directing testing efforts towards code affected by more critical changes. Our experiments show that these techniques can significantly improve the fault detection rate than other traditional and widely used approaches.</abstract><cop>Chichester</cop><pub>Wiley Subscription Services, Inc</pub><doi>10.1002/smr.2653</doi><tpages>28</tpages><orcidid>https://orcid.org/0000-0001-7945-9014</orcidid><orcidid>https://orcid.org/0000-0002-7111-3171</orcidid><orcidid>https://orcid.org/0000-0001-7090-4249</orcidid><orcidid>https://orcid.org/0000-0002-1019-9004</orcidid><orcidid>https://orcid.org/0000-0002-9156-5079</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 2047-7473
ispartof	Journal of software : evolution and process, 2024-08, Vol.36 (8), p.n/a
issn	2047-7473 2047-7481
language	eng
recordid	cdi_proquest_journals_3089037674
source	Access via Wiley Online Library
subjects	Effectiveness Fault detection Faults Natural language processing regression testing Similarity Software testing Source code source code changes Structured data test prioritization tree kernels
title	Regression test prioritization leveraging source code similarity with tree kernels
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T18%3A56%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Regression%20test%20prioritization%20leveraging%20source%20code%20similarity%20with%20tree%20kernels&rft.jtitle=Journal%20of%20software%20:%20evolution%20and%20process&rft.au=Altiero,%20Francesco&rft.date=2024-08&rft.volume=36&rft.issue=8&rft.epage=n/a&rft.issn=2047-7473&rft.eissn=2047-7481&rft_id=info:doi/10.1002/smr.2653&rft_dat=%3Cproquest_cross%3E3089037674%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3089037674&rft_id=info:pmid/&rfr_iscdi=true