Empirical Assessment of a Software Metric: The Information Content of Operators

This paper presents an empirical case study that predicted faults in modules based on the total information content of the operators. This metric is closely related to Harrison's average information content classification (AICC), which is the entropy of the operators. Most information theory-ba...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Software quality journal 2001-06, Vol.9 (2), p.99
Hauptverfasser:	Khoshgoftaar, Taghi M, Allen, Edward B
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Assembly language Case studies Datasets Information theory Regression analysis Software development Software quality
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper presents an empirical case study that predicted faults in modules based on the total information content of the operators. This metric is closely related to Harrison's average information content classification (AICC), which is the entropy of the operators. Most information theory-based metrics proposed in the literature have not been subjected to empirical predictive studies of real-world software systems. In contrast, this study shows that a simple information theory-based metric can be more useful for prediction of software quality than comparable metrics based on counts in the context of a commercial software development organization. Three models were considered, all based on operators as an abstraction of software. The model based on information content of the operators made more accurate predictions than two similar models based on the number of operators and the number of unique operators. The purpose of this paper is a fair comparison of the three metrics, rather than developing an optimal model. We have long advocated multivariate models for industrial use. The case study considered three large commercial systems, written in assembly language, and developed consecutively by professional programmers. The first system was used to estimate parameters of the models. The subsequent two were used to evaluate the accuracy of model predictions.
ISSN:	0963-9314 1573-1367
DOI:	10.1023/A:1016622818771