Learning Component Size Distributions for Software Cost Estimation: Models Based on Arithmetic and Shifted Geometric Means Rules

Understanding software size distribution is critical to software cost estimation using COCOMO model and design of reliable production function model. This paper proposes and validates a theoretical framework based on the maximization of Shannon entropy to learn component size distribution of softwar...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on software engineering 2022-12, Vol.48 (12), p.5136-5147
Hauptverfasser: Sharma, Shachi, Pendharkar, Parag C., Karmeshu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Understanding software size distribution is critical to software cost estimation using COCOMO model and design of reliable production function model. This paper proposes and validates a theoretical framework based on the maximization of Shannon entropy to learn component size distribution of software systems when partial information about the moments is given. Specification of appropriate moment constraints either in the form of shifted geometric mean or arithmetic mean or both geometric and arithmetic means are considered. The models are validated using 30 real datasets. The analysis reveals that software systems where component sizes depict power-law behavior are governed by shifted geometric mean whereas those systems in which component size distribution shows exponential behavior are described by arithmetic mean. Another type of software system is also considered where the component size distribution is found to depict gamma distribution. Such systems are characterized by specification of both arithmetic and geometric means. The study underlines that the use of modern object-oriented programming languages adheres to power-law distribution indicating the existence of team synergies leading to substantial containment of software costs when compared to the use of traditional procedural programming languages.
ISSN:0098-5589
1939-3520
DOI:10.1109/TSE.2021.3139216