Automatically Tagging the "AAA" Pattern in Unit Test Cases Using Machine Learning Models

The AAA pattern (i.e., Arrange-Act-Assert ) is a common and natural layout to create a test case. Following this pattern in test cases may benefit comprehension, debugging, and maintenance. The AAA structure of real-life test cases, however, may not be clear due to their high complexity. Manually la...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on software engineering 2023-05, Vol.49 (5), p.3305-3324
Hauptverfasser: Wei, Chenhao, Xiao, Lu, Yu, Tingting, Chen, Xinyu, Wang, Xiao, Wong, Sunny, Clune, Abigail
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The AAA pattern (i.e., Arrange-Act-Assert ) is a common and natural layout to create a test case. Following this pattern in test cases may benefit comprehension, debugging, and maintenance. The AAA structure of real-life test cases, however, may not be clear due to their high complexity. Manually labeling AAA statements in test cases is tedious. Thus, we envision that an automated approach for labeling AAA statements in existing test cases could benefit new developers and projects that practice collective code ownership and test-driven development. This paper contributes an automatic approach based on machine learning models. The "secret sauce" of this approach is a set of three learning features that are based on the semantic, syntax, and context information in test cases, derived from the manual tagging process. Thus, our approach mimics how developers may manually tag the AAA pattern of a test case. We assess the precision, recall, and F-1 score of our approach based on 449 test cases, containing about 16,612 statements, across 4 Apache open source projects. To achieve the best performance in our approach, we explore the usage of six machine learning models; the contribution of the SMOTE data balancing technique; the comparison of the three learning features; and the comparison of five different methods for calculating the semantic feature. The results show our approach is able to identify Arrangement , Action , and Assertion statements with a precision upwards of 92%, and recall up to 74%. We also summarize some experience based on our experiments-regarding the choice of machine learning models, data balancing algorithm, and feature engineering methods-which could potentially provide some reference to related future research.
ISSN:0098-5589
1939-3520
DOI:10.1109/TSE.2023.3252442