Enhancing software model encoding for feature location approaches based on machine learning techniques

Feature location is one of the main activities performed during software evolution. In our previous works, we proposed an approach for feature location in models based on machine learning, providing evidence that machine learning techniques can obtain better results than other retrieval techniques f...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Software and systems modeling 2022-02, Vol.21 (1), p.399-433
Hauptverfasser: Marcén, Ana C., Pérez, Francisca, Pastor, Óscar, Cetina, Carlos
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Feature location is one of the main activities performed during software evolution. In our previous works, we proposed an approach for feature location in models based on machine learning, providing evidence that machine learning techniques can obtain better results than other retrieval techniques for feature location in models. However, to apply machine learning techniques optimally, the design of an encoding is essential to be able to identify the best realization of a feature. In this work, we present more thorough research about software model encoding for feature location approaches based on machine learning. As part of this study, we have provided two new software model encodings and compared them with the source encoding. The first proposed encoding is an extension of the source encoding to take advantage of not only the main concepts and relations of a domain but also the properties of these concepts and relations. The second proposed encoding is inspired by the characteristics used in benchmark datasets for research on Learning to Rank. Afterward, the new encodings are used to compare three different machine learning techniques (RankBoost, Feedforward Neural Network, and Recurrent Neural Network). The study also considers whether a domain-independent encoding such as the ones proposed in this work can outperform an encoding that is specifically designed to exploit human experience and domain knowledge. Furthermore, the results of the best encoding and the best machine learning technique were compared to two traditional approaches that have been widely applied for feature location as well as for traceability link recovery and bug localization. The evaluation is based on two real-world case studies, one in the railway domain and the other in the induction hob domain. An approach for feature location in models evaluates these case studies with the different encodings and machine learning techniques. The results show that when using the second proposed encoding and RankBoost, the approach outperforms the results of the other encodings and machine learning techniques and the results of the traditional approaches. Specifically, the approach achieved the best results for all the performance indicators, providing a mean precision value of 90.11%, a recall value of 86.20%, a F-measure value of 87.22%, and a MCC value of 0.87. The statistical analysis of the results shows that this approach significantly improves the results and increases the magnitude of t
ISSN:1619-1366
1619-1374
DOI:10.1007/s10270-021-00920-y