On the analysis of power law distribution in software component sizes

Component‐based software development (CBSD) is an active area of research. Ascertaining the quality of components is important for overall software quality assurance in CBSD. One of the important metrics for measuring defects, analyzability, efforts, and cost in CBSD is component size. The paper pre...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of software : evolution and process 2022-02, Vol.34 (2), p.n/a
Hauptverfasser: Sharma, Shachi, Pendharkar, Parag C.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Component‐based software development (CBSD) is an active area of research. Ascertaining the quality of components is important for overall software quality assurance in CBSD. One of the important metrics for measuring defects, analyzability, efforts, and cost in CBSD is component size. The paper presents an analytical model based on maximization of Tsallis entropy to obtain closed form expression for component size distribution (maximum Tsallis entropy component size distribution, MTECSD) in steady state. It is found that the component size distribution follows power law asymptotically. A procedure based on generalized Jensen–Shannon measure is developed to estimate model parameters. A detailed analysis of many popular probability distributions along with MTECSD is carried out on many diverse real data sets of component‐based softwares. The analysis reveals that lognormal and MTECSD distributions fit well to component sizes in many software conforming the presence of power law behavior. The software whose component size distributions are described by MTECSD are in equilibrium implying that new defects in these software systems occur occasionally. Power law behavior in component sizes also imply high variation leading to difficulty in software analyzability. The precise knowledge of component size distribution also provides an alternative method to compute efforts and cost estimates by modified COCOMO model. An analytical model based on maximum Tsallis entropy component size distribution (MTECSD) is proposed to obtain closed form expression of component size distribution. MTECSD, Pareto, Lognormal, and Weibull distributions are compared over 35 datasets. Lognormal and MTECSD outperform other distributions and are further used to compute expected software size leading to modified COCOMO model.
ISSN:2047-7473
2047-7481
DOI:10.1002/smr.2417