Hierarchical Clustering for Software Architecture Recovery

Gaining an architectural level understanding of a software system is important for many reasons. When the description of a system's architecture does not exist, attempts must be made to recover it. In recent years, researchers have explored the use of clustering for recovering a software system...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on software engineering 2007-11, Vol.33 (11), p.759-780
Hauptverfasser:	Maqbool, O., Babri, H.A.
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithm design and analysis Algorithms and reengineering arbitrary decisions Architecture Architecture (computers) architecture recovery Cluster analysis Clustering Clustering algorithms Computer architecture Computer engineering Computer programs Decomposition Digital Object Identifier Electrical engineering hierarchical clustering Legacy systems Partitioning algorithms Recovery Restructuring Reverse engineering Similarity Software Software algorithms Software architecture Software Engineering Software measurement Software systems Studies Taxonomy
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Gaining an architectural level understanding of a software system is important for many reasons. When the description of a system's architecture does not exist, attempts must be made to recover it. In recent years, researchers have explored the use of clustering for recovering a software system's architecture, given only its source code. The main contributions of this paper are given as follows. First, we review hierarchical clustering research in the context of software architecture recovery and modularization. Second, to employ clustering meaningfully, it is necessary to understand the peculiarities of the software domain, as well as the behavior of clustering measures and algorithms in this domain. To this end, we provide a detailed analysis of the behavior of various similarity and distance measures that may be employed for software clustering. Third, we analyze the clustering process of various well-known clustering algorithms by using multiple criteria, and we show how arbitrary decisions taken by these algorithms during clustering affect the quality of their results. Finally, we present an analysis of two recently proposed clustering algorithms, revealing close similarities in their apparently different clustering approaches. Experiments on four legacy software systems provide insight into the behavior of well-known clustering algorithms and their characteristics in the software domain.
ISSN:	0098-5589 1939-3520
DOI:	10.1109/TSE.2007.70732