Semi-supervised hierarchical ensemble clustering based on an innovative distance metric and constraint information
Agglomerative Hierarchical Clustering (AHC) is a bottom-up clustering strategy in which each object is originally a cluster, and more pairs of clusters are formed by traversing the hierarchy. It has been proven that there is no individual AHC clustering algorithm that can be efficient in all situati...
Gespeichert in:
Veröffentlicht in: | Engineering applications of artificial intelligence 2023-09, Vol.124, p.106571, Article 106571 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Agglomerative Hierarchical Clustering (AHC) is a bottom-up clustering strategy in which each object is originally a cluster, and more pairs of clusters are formed by traversing the hierarchy. It has been proven that there is no individual AHC clustering algorithm that can be efficient in all situations. In order to address this problem, ensemble clustering techniques have been introduced. These techniques combine the results of several output partitions to achieve a consensus with higher accuracy compared to an individual clustering algorithm. This paper proposes an AHC-based ensemble semi-supervised clustering algorithm to improve performance. In semi-supervised clustering, class membership information is used in some objects. Here, we introduce the Semi-Supervised Ensemble Hierarchical Clustering based on Constraints Information (SSEHCCI) algorithm. SSEHCCI is developed using several individual clustering algorithms based on AHC. SSEHCCI includes a flexible weighting policy to generate base partitions and uses the constraints information to configure the semi-supervised clustering. In addition, SSEHCCI uses an innovative distance measure to calculate the distance between each pair of objects. Experimental results show that SSEHCCI performs better than existing semi-supervised algorithms on some University of California Irvine (UCI) datasets. Specifically, we observed an average accuracy of SSEHCCI compared to SSDC and RSSC of 2.6% and 1.8%, respectively. |
---|---|
ISSN: | 0952-1976 1873-6769 |
DOI: | 10.1016/j.engappai.2023.106571 |