Efficient software clustering technique using an adaptive and preventive dendrogram cutting approach

•We propose a complementary mechanism to improve efficiency of software clustering.•We performed two simulations to test the accuracy of the proposed technique.•We found out that accuracy will decrease when utility classes are involved.•We found that multiple cutting points are feasible under certai...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Information and software technology 2013-11, Vol.55 (11), p.1994-2012
Hauptverfasser:	Chong, Chun Yong, Lee, Sai Peck, Ling, Teck Chaw
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Clustering Computer programs Cutting Design recovery Mathematical problems Remodularization Reverse engineering Software Software clustering Software engineering Software maintenance Studies Systems design
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	•We propose a complementary mechanism to improve efficiency of software clustering.•We performed two simulations to test the accuracy of the proposed technique.•We found out that accuracy will decrease when utility classes are involved.•We found that multiple cutting points are feasible under certain circumstances. Software clustering is a key technique that is used in reverse engineering to recover a high-level abstraction of the software in the case of limited resources. Very limited research has explicitly discussed the problem of finding the optimum set of clusters in the design and how to penalize for the formation of singleton clusters during clustering. This paper attempts to enhance the existing agglomerative clustering algorithms by introducing a complementary mechanism. To solve the architecture recovery problem, the proposed approach focuses on minimizing redundant effort and penalizing for the formation of singleton clusters during clustering while maintaining the integrity of the results. An automated solution for cutting a dendrogram that is based on least-squares regression is presented in order to find the best cut level. A dendrogram is a tree diagram that shows the taxonomic relationships of clusters of software entities. Moreover, a factor to penalize clusters that will form singletons is introduced in this paper. Simulations were performed on two open-source projects. The proposed approach was compared against the exhaustive and highest gap dendrogram cutting methods, as well as two well-known cluster validity indices, namely, Dunn’s index and the Davies-Bouldin index. When comparing our clustering results against the original package diagram, our approach achieved an average accuracy rate of 90.07% from two simulations after the utility classes were removed. The utility classes in the source code affect the accuracy of the software clustering, owing to its omnipresent behavior. The proposed approach also successfully penalized the formation of singleton clusters during clustering. The evaluation indicates that the proposed approach can enhance the quality of the clustering results by guiding software maintainers through the cutting point selection process. The proposed approach can be used as a complementary mechanism to improve the effectiveness of existing clustering algorithms.
ISSN:	0950-5849 1873-6025
DOI:	10.1016/j.infsof.2013.07.002