Discrete and Balanced Spectral Clustering With Scalability

Spectral Clustering (SC) has been the main subject of intensive research due to its remarkable clustering performance. Despite its successes, most existing SC methods suffer from several critical issues. First, they typically involve two independent stages, i.e., learning the continuous relaxation m...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence 2023-12, Vol.45 (12), p.14321-14336
Hauptverfasser: Wang, Rong, Chen, Huimin, Lu, Yihang, Zhang, Qianrong, Nie, Feiping, Li, Xuelong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Spectral Clustering (SC) has been the main subject of intensive research due to its remarkable clustering performance. Despite its successes, most existing SC methods suffer from several critical issues. First, they typically involve two independent stages, i.e., learning the continuous relaxation matrix followed by the discretization of the cluster indicator matrix. This two-stage approach can result in suboptimal solutions that negatively impact the clustering performance. Second, these methods are hard to maintain the balance property of clusters inherent in many real-world data, which restricts their practical applicability. Finally, these methods are computationally expensive and hence unable to handle large-scale datasets. In light of these limitations, we present a novel Discrete and Balanced Spectral Clustering with Scalability (DBSC) model that integrates the learning the continuous relaxation matrix and the discrete cluster indicator matrix into a single step. Moreover, the proposed model also maintains the size of each cluster approximately equal, thereby achieving soft-balanced clustering. What's more, the DBSC model incorporates an anchor-based strategy to improve its scalability to large-scale datasets. The experimental results demonstrate that our proposed model outperforms existing methods in terms of both clustering performance and balance performance. Specifically, the clustering accuracy of DBSC on CMUPIE data achieved a 17.93% improvement compared with that of the SOTA methods (LABIN, EBSC, etc.).
ISSN:0162-8828
2160-9292
1939-3539
DOI:10.1109/TPAMI.2023.3311828