Software quality modeling: The impact of class noise on the random forest classifier

This study investigates the impact of increasing levels of simulated class noise on software quality classification. Class noise was injected into seven software engineering measurement datasets, and the performance of three learners, random forests, C4.5, and Naive Bayes, was analyzed. The random f...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Folleco, A., Khoshgoftaar, T.M., Van Hulse, J., Bullard, L.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This study investigates the impact of increasing levels of simulated class noise on software quality classification. Class noise was injected into seven software engineering measurement datasets, and the performance of three learners, random forests, C4.5, and Naive Bayes, was analyzed. The random forest classifier was utilized for this study because of its strong performance relative to well-known and commonly-used classifiers such as C4.5 and Naive Bayes. Further, relatively little prior research in software quality classification has considered the random forest classifier. The experimental factors considered in this study were the level of class noise and the percent of minority instances injected with noise. The empirical results demonstrate that the random forest obtained the best and most consistent classification performance in all experiments.
ISSN:1089-778X
1941-0026
DOI:10.1109/CEC.2008.4631321