Topic modeling for feature location in software models: Studying both code generation and interpreted models

[EN] Context: In the last 20 years, the research community has increased its attention to the use of topic modeling for software maintenance and evolution tasks in code. Topic modeling is a popular and promising information retrieval technique that represents topics by word probabilities. Latent Dir...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Pérez Pérez, María Francisca, Lapeña, Raúl, Marcén, Ana C, Cetina Englada, Carlos
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:[EN] Context: In the last 20 years, the research community has increased its attention to the use of topic modeling for software maintenance and evolution tasks in code. Topic modeling is a popular and promising information retrieval technique that represents topics by word probabilities. Latent Dirichlet Allocation (LDA) is one of the most popular topic modeling methods. However, the use of topic modeling in model-driven software development has been largely neglected. Since software models have less noise (implementation details) than software code, software models might be well-suited for topic modeling. Objective: This paper presents our LDA-guided evolutionary approach for feature location in software models. Specifically, we consider two types of software models: models for code generation and interpreted model. Method: We evaluate our approach considering two real-world industrial case studies: code-generation models for train control software, and interpreted models for a commercial video game. To study the impact on the results, we compare our approach for feature location in models against random search and a baseline based on Latent Semantic Indexing, which is a popular information retrieval technique. In addition, we perform a statistical analysis of the results to show that this impact is significant. We also discuss the results in terms of the following aspects: data sparsity, implementation complexity, calibration, and stability. Results: Our approach significantly outperforms the baseline in terms of recall, precision and F-measure when it comes to interpreted models. This is not the case for code-generation models. Conclusions: Our analysis of the results uncovers a recommendation towards results improvement. We also show that calibration approaches can be transferred from code to models. The findings of our work with regards to the compensation of instability have the potential to help not only feature location in models, but also in code. This work has been partially supported by the Ministry of Economy and Competitiveness (MINECO) , Spain through the Spanish National R+D+i Plan and ERDF funds under the Project ALPS (RTI2018-096411-B-I00). Pérez Pérez, MF.; Lapeña, R.; Marcén, AC.; Cetina Englada, C. (2021). Topic modeling for feature location in software models: Studying both code generation and interpreted models. Information and Software Technology. 140. https://doi.org/10.1016/j.infsof.2021.106676