CSSG: A cost‐sensitive stacked generalization approach for software defect prediction

Summary The prediction of software artifacts on defect‐prone (DP) or non‐defect‐prone (NDP) classes during the testing phase helps minimize software business costs, which is a classification task in software defect prediction (SDP) field. Machine learning methods are helpful for the task, although t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Software testing, verification & reliability verification & reliability, 2021-08, Vol.31 (5), p.n/a
Hauptverfasser: Eivazpour, Zeinab, Keyvanpour, Mohammad Reza
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Summary The prediction of software artifacts on defect‐prone (DP) or non‐defect‐prone (NDP) classes during the testing phase helps minimize software business costs, which is a classification task in software defect prediction (SDP) field. Machine learning methods are helpful for the task, although they face the challenge of data imbalance distribution. The challenge leads to serious misclassification of artifacts, which will disrupt the predictor's performance. The previously developed stacking ensemble methods do not consider the cost issue to handle the class imbalance problem (CIP) over the training dataset in the SDP field. To bridge this research gap, in the cost‐sensitive stacked generalization (CSSG) approach, we try to combine the staking ensemble learning method with cost‐sensitive learning (CSL) since the CSL purpose is to reduce misclassification costs. In the cost‐sensitive stacked generalization (CSSG) approach, logistic regression (LR) and extremely randomized trees classifiers in cases of CSL and cost‐insensitive are used as a final classifier of stacking scheme. To evaluate the performance of CSSG, we use six performance measures. Several experiments are carried out to compare the CSSG with some cost‐sensitive ensemble methods on 15 benchmark datasets with different imbalance levels. The results indicate that the CSSG can be an effective solution to the CIP than other compared methods. We developed the CSSG ensemble approach to introduces cost‐sensitive learning into the Stacking ensemble method that minimizes misclassification costs. The CSSG are compared with some cost‐sensitive ensemble methods on 15 benchmark datasets. The results indicate that the CSSG can be effective solution for the class imbalance problem than other compared methods.
ISSN:0960-0833
1099-1689
DOI:10.1002/stvr.1761