Parallel Text Clustering Based on MapReduce

This paper analyzes challenges of ordinary text clustering algorithms and proposes cloud computing can be a feasible solution. The classical Jarvis-Patrick (JP) algorithm was adapted as a study case. It was implemented using MapReduce programming mode and was testified on the cloud computing platfor...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Cao Zewen, Zhou Yao
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This paper analyzes challenges of ordinary text clustering algorithms and proposes cloud computing can be a feasible solution. The classical Jarvis-Patrick (JP) algorithm was adapted as a study case. It was implemented using MapReduce programming mode and was testified on the cloud computing platform-Hadoop with Sogou corpus provided by Sogou laboratory. The experiment results demonstrate that text clustering algorithm can be paralleled in MapReduce framework and parallel algorithm can handle massive textual data and get a better time performance.
DOI:10.1109/CGC.2012.128