Fault Prediction Modeling for Software Quality Estimation: Comparing Commonly Used Techniques

High-assurance and complex mission-critical software systems are heavily dependent on reliability of their underlying software applications. An early software fault prediction is a proven technique in achieving high software reliability. Prediction models based on software metrics can predict number...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Empirical software engineering : an international journal 2003-09, Vol.8 (3), p.255-283
Hauptverfasser: Khoshgoftaar, Taghi M, Seliya, Naeem
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:High-assurance and complex mission-critical software systems are heavily dependent on reliability of their underlying software applications. An early software fault prediction is a proven technique in achieving high software reliability. Prediction models based on software metrics can predict number of faults in software modules. Timely predictions of such models can be used to direct cost-effective quality enhancement efforts to modules that are likely to have a high number of faults. We evaluate the predictive performance of six commonly used fault prediction techniques: CART-LS (least squares), CART-LAD (least absolute deviation), S-PLUS, multiple linear regression, artificial neural networks, and case-based reasoning. The case study consists of software metrics collected over four releases of a very large telecommunications system. Performance metrics, average absolute and average relative errors, are utilized to gauge the accuracy of different prediction models. Models were built using both, original software metrics (RAW) and their principle components (PCA). Two-way ANOVA randomized-complete block design models with two blocking variables are designed with average absolute and average relative errors as response variables. System release and the model type (RAW or PCA) form the blocking variables and the prediction technique is treated as a factor. Using multiple-pairwise comparisons, the performance order of prediction models is determined. We observe that for both average absolute and average relative errors, the CART-LAD model performs the best while the S-PLUS model is ranked sixth.[PUBLICATION ABSTRACT]
ISSN:1382-3256
1573-7616
DOI:10.1023/A:1024424811345