Scalable Co-Clustering for Large-Scale Data through Dynamic Partitioning and Hierarchical Merging
Co-clustering simultaneously clusters rows and columns, revealing more fine-grained groups. However, existing co-clustering methods suffer from poor scalability and cannot handle large-scale data. This paper presents a novel and scalable co-clustering method designed to uncover intricate patterns in...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Co-clustering simultaneously clusters rows and columns, revealing more
fine-grained groups. However, existing co-clustering methods suffer from poor
scalability and cannot handle large-scale data. This paper presents a novel and
scalable co-clustering method designed to uncover intricate patterns in
high-dimensional, large-scale datasets. Specifically, we first propose a large
matrix partitioning algorithm that partitions a large matrix into smaller
submatrices, enabling parallel co-clustering. This method employs a
probabilistic model to optimize the configuration of submatrices, balancing the
computational efficiency and depth of analysis. Additionally, we propose a
hierarchical co-cluster merging algorithm that efficiently identifies and
merges co-clusters from these submatrices, enhancing the robustness and
reliability of the process. Extensive evaluations validate the effectiveness
and efficiency of our method. Experimental results demonstrate a significant
reduction in computation time, with an approximate 83% decrease for dense
matrices and up to 30% for sparse matrices. |
---|---|
DOI: | 10.48550/arxiv.2410.18113 |