Distributed Peer-to-Peer Cooperative Partitional-Divisive Clustering for gene expression datasets

Clustering techniques are helpful in understanding gene regulation, cellular processes, and subtypes of cells. A major thrust of gene expression analysis over the last twenty years has been the acquisition of enormous amount of various distributed sources of gene expression datasets. Thus, it is bec...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Kashef, R., Kamel, M.S.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Clustering techniques are helpful in understanding gene regulation, cellular processes, and subtypes of cells. A major thrust of gene expression analysis over the last twenty years has been the acquisition of enormous amount of various distributed sources of gene expression datasets. Thus, it is becoming increasingly important to perform clustering of distributed data in-place, without the need to pool it first into a central node. The general goal of distributed clustering is achieving a level of speedup than the centralized approaches. A recent study shows that centralized cooperative clustering outperforms the non-cooperative centralized clustering approaches. In this paper a novel distributed cooperative partitional-divisive clustering in a peer-to-peer network is presented. The distributed CPDC approach is based on intermediate cooperation between the Partitional k-means and the divisive bisecting k-means in a distributed Peer-to-Peer network to produce better global solutions. Computational experiments were conducted to test the performance of the distributed CPDC approach using different gene expression datasets. Undertaken experimental results show that the performance of the distributed CPDC method is better than that of the non-cooperative distributed k-means and distributed bisecting k-means. Thus a new cooperative technique for distributed gene expression repositories is efficiently presented to discover regularities and genes that may span multiple nodes.
DOI:10.1109/CIBCB.2008.4675771