Negative results for software effort estimation

More than half the literature on software effort estimation (SEE) focuses on comparisons of new estimation methods. Surprisingly, there are no studies comparing state of the art latest methods with decades-old approaches. Accordingly, this paper takes five steps to check if new SEE methods generated...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Empirical software engineering : an international journal 2017-10, Vol.22 (5), p.2658-2683
Hauptverfasser:	Menzies, Tim, Yang, Ye, Mathew, George, Boehm, Barry, Hihn, Jairus
Format:	Artikel
Sprache:	eng
Schlagworte:	Clustering Compilers Computer Science Datasets Economic models Estimating techniques Estimation Interpreters Parameter estimation Programming Languages Software Software development Software engineering Software Engineering/Programming and Operating Systems
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	More than half the literature on software effort estimation (SEE) focuses on comparisons of new estimation methods. Surprisingly, there are no studies comparing state of the art latest methods with decades-old approaches. Accordingly, this paper takes five steps to check if new SEE methods generated better estimates than older methods. Firstly, collect effort estimation methods ranging from “classical” COCOMO (parametric estimation over a pre-determined set of attributes) to “modern” (reasoning via analogy using spectral-based clustering plus instance and feature selection, and a recent “baseline method” proposed in ACM Transactions on Software Engineering). Secondly, catalog the list of objections that lead to the development of post-COCOMO estimation methods. Thirdly, characterize each of those objections as a comparison between newer and older estimation methods. Fourthly, using four COCOMO-style data sets (from 1991, 2000, 2005, 2010) and run those comparisons experiments. Fifthly, compare the performance of the different estimators using a Scott-Knott procedure using (i) the A12 effect size to rule out “small” differences and (ii) a 99 % confident bootstrap procedure to check for statistically different groupings of treatments. The major negative result of this paper is that for the COCOMO data sets, nothing we studied did any better than Boehms original procedure. Hence, we conclude that when COCOMO-style attributes are available, we strongly recommend (i) using that data and (ii) use COCOMO to generate predictions. We say this since the experiments of this paper show that, at least for effort estimation, how data is collected is more important than what learner is applied to that data.
ISSN:	1382-3256 1573-7616
DOI:	10.1007/s10664-016-9472-2