DSSDPP: Data Selection and Sampling Based Domain Programming Predictor for Cross-Project Defect Prediction

Cross-project defect prediction (CPDP) refers to recognizing defective software modules in one project (i.e., target) using historical data collected from other projects (i.e., source), which can help developers find defects and prioritize their testing efforts. Unfortunately, there often exists lar...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on software engineering 2023-04, Vol.49 (4), p.1941-1963
Hauptverfasser: Li, Zhiqiang, Zhang, Hongyu, Jing, Xiao-Yuan, Xie, Juanying, Guo, Min, Ren, Jie
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Cross-project defect prediction (CPDP) refers to recognizing defective software modules in one project (i.e., target) using historical data collected from other projects (i.e., source), which can help developers find defects and prioritize their testing efforts. Unfortunately, there often exists large distribution difference between the source and target data. Most CPDP methods neglect to select the appropriate source data for a given target at the project level. More importantly, existing CPDP models are parametric methods, which usually require intensive parameter selection and tuning to achieve better prediction performance. This would hinder wide applicability of CPDP in practice. Moreover, most CPDP methods do not address the cross-project class imbalance problem. These limitations lead to suboptimal CPDP results. In this paper, we propose a novel data selection and sampling based domain programming predictor (DSSDPP) for CPDP, which addresses the above limitations. DSSDPP is a non-parametric CPDP method, which can perform knowledge transfer across projects without the need for parameter selection and tuning. By exploiting the structures of source and target data, DSSDPP can learn a discriminative transfer classifier for identifying defects of the target project. Extensive experiments on 22 projects from four datasets indicate that DSSDPP achieves better MCC and AUC results against a range of competing methods both in the single-source and multi-source scenarios. Since DSSDPP is easy, effective, extensible, and efficient, we suggest that future work can use it with the well-chosen source data to conduct CPDP especially for the projects with limited computational budget.
ISSN:0098-5589
1939-3520
DOI:10.1109/TSE.2022.3204589