Regression test prioritization leveraging source code similarity with tree kernels

Regression test prioritization (RTP) is an active research field, aiming at re‐ordering the tests in a test suite to maximize the rate at which faults are detected. A number of RTP strategies have been proposed, leveraging different factors to reorder tests. Some techniques include an analysis of ch...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of software : evolution and process 2024-08, Vol.36 (8), p.n/a
Hauptverfasser: Altiero, Francesco, Corazza, Anna, Di Martino, Sergio, Peron, Adriano, Libero Lucio Starace, Luigi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Regression test prioritization (RTP) is an active research field, aiming at re‐ordering the tests in a test suite to maximize the rate at which faults are detected. A number of RTP strategies have been proposed, leveraging different factors to reorder tests. Some techniques include an analysis of changed source code, to assign higher priority to tests stressing modified parts of the codebase. Still, most of these change‐based solutions focus on simple text‐level comparisons among versions. We believe that measuring source code changes in a more refined way, capable of discriminating between mere textual changes (e.g., renaming of a local variable) and more structural changes (e.g., changes in the control flow), could lead to significant benefits in RTP, under the assumption that major structural changes are also more likely to introduce faults. To this end, we propose two novel RTP techniques that leverage tree kernels (TK), a class of similarity functions largely used in Natural Language Processing on tree‐structured data. In particular, we apply TKs to syntax trees of source code, to more precisely quantify the extent of structural changes in the source code, and prioritize tests accordingly. We assessed the effectiveness of the proposals by conducting an empirical study on five real‐world Java projects, also used in a number of RTP‐related papers. We automatically generated, for each considered pair of software versions (i.e., old version, new version) in the evolution of the involved projects, 100 variations with artificially injected faults, leading to over 5k different software evolution scenarios overall. We compared the proposed prioritization approaches against well‐known prioritization techniques, evaluating both their effectiveness and their execution times. Our findings show that leveraging more refined code change analysis techniques to quantify the extent of changes in source code can lead to relevant improvements in prioritization effectiveness, while typically introducing negligible overheads due to their execution. Not all changes to a codebase are equal: Some modifications (e.g., heavy refactoring) are more critical than others (e.g., renaming local variables). In this paper, we present two regression test prioritization techniques, namely, method‐level tree kernel prioritization (MTK) and method‐level tree kernel with quotient set (MTK‐QS), leveraging tree kernel functions to effectively measure the structural similarity of changed metho
ISSN:2047-7473
2047-7481
DOI:10.1002/smr.2653