A novel source project and optimized training data selection approach for cross-project fault prediction: A novel source project and optimized training data

Software fault prediction (SFP) is a great tool for limiting the software testing resources allocation and enhancing the software reliability. In reality, collecting adequate historical training data for a new developing project might be a challenging task, in such cases cross-project fault predicti...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Journal of supercomputing 2025, Vol.81 (1)
Hauptverfasser: Manchala, Pravali, Bisi, Manjubala
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Software fault prediction (SFP) is a great tool for limiting the software testing resources allocation and enhancing the software reliability. In reality, collecting adequate historical training data for a new developing project might be a challenging task, in such cases cross-project fault prediction (CPFP) is useful. Prior studies have demonstrated transfer learning and training data selection models for CPFP. However, existing models are unstable to the source projects that are used to train the prediction model. In addition, imbalanced projects and irrelevant features are issues to review in CPFP. To address the limitations in existing CPFP models, we propose a novel optimized source data selection model for CPFP through Wilcoxon signed-rank test-based source project selection (WPS) and an optimized training data construction (optimizedTC) technique called WPSTC. We evaluate WPSTC with 31 datasets and seven performance measures and compare it with existing fault prediction models over five conventional and one ensemble machine learning model. On average, WPSTC outperforms CPFP models and can solve existing models’ sensitivity towards selected source projects and solve the imbalance and curse of dimensionality issues of CPFP models.
ISSN:0920-8542
1573-0484
DOI:10.1007/s11227-024-06750-1