Neighbor cleaning learning based cost‐sensitive ensemble learning approach for software defect prediction
Summary The class imbalance problem in software defect prediction datasets leads to prediction results that are biased toward the majority class, and the class overlap problem leads to fuzzy boundaries for classification decisions, both of which affect the model's prediction performance on the...
Gespeichert in:
Veröffentlicht in: | Concurrency and computation 2024-05, Vol.36 (12), p.n/a |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Summary
The class imbalance problem in software defect prediction datasets leads to prediction results that are biased toward the majority class, and the class overlap problem leads to fuzzy boundaries for classification decisions, both of which affect the model's prediction performance on the dataset. A neighbor cleaning learning (NCL) is an effective technique for defect prediction. To solve the class overlap problem and class imbalance problem, the NCL‐based cost‐sensitive ensemble learning approach for software defect prediction (NCL_CSEL) model is proposed. First, the bootstrap resampled data are trained using the base classifier. Subsequently, multiple classifiers are integrated by a static ensemble to obtain the final classification results. As the base classifier, the Adaptive Boosting (AdaBoost) classifier combining NCL and cost‐sensitive learning is proposed, and the class overlap problem and class imbalance problem are solved by balancing the proportion of overlap sample removal in NCL and the size of the cost factor in cost‐sensitive learning. Specifically, the NCL algorithm is used to initialize the sample weights, while the cost‐sensitive method is employed to update the sample weights. Experiments based on the NASA dataset and AEEEM dataset show that the defect prediction model can improve the bal value by approximately 7% and the AUC value by 9.5% when the NCL algorithm is added. NCL_CSEL can effectively solve the class imbalance problem and significantly improve the prediction performance compared with existing methods for solving the class imbalance problem. |
---|---|
ISSN: | 1532-0626 1532-0634 |
DOI: | 10.1002/cpe.8017 |