Parallelizing an Information Theoretic Co-clustering Algorithm Using a Cloud Middleware

The emerging cloud environments are well suited for storage and analysis of large datasets, since they can allow on-demand access to resources. However, developing high-performance implementations of data analysis tasks is a challenging problem. In our prior work, we have developed a middleware call...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Ramanathan, V, Wenjing Ma, Ravi, V T, Tantan Liu, Agrawal, G
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Algorithm design and analysis Arrays Classification algorithms Clustering algorithms Co-clustering Data mining Distributed databases Middleware Parallel Data Mining
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The emerging cloud environments are well suited for storage and analysis of large datasets, since they can allow on-demand access to resources. However, developing high-performance implementations of data analysis tasks is a challenging problem. In our prior work, we have developed a middleware called FREERIDE (FRamework for Rapid Implementation of Data mining Engines). FREERIDE is based upon the observation that the processing structure of a large number of data mining algorithms involves generalized reductions. FREERIDE offers a high-level interface and implements both distributed memory and shared memory parallelization. In this paper, we consider a challenging new data mining algorithm, information theoretic co-clustering, and parallelize it using the FREERIDE middleware. We show how the main processing loops of row clustering and column clustering of the Co-clustering algorithm can essentially be fit into a generalized reduction structure. We achieve good parallel efficiency, with a speedup of nearly 21 on 32 cores.
ISSN:	2375-9232 2375-9259
DOI:	10.1109/ICDMW.2010.100