Parallel Text Clustering Based on MapReduce
This paper analyzes challenges of ordinary text clustering algorithms and proposes cloud computing can be a feasible solution. The classical Jarvis-Patrick (JP) algorithm was adapted as a study case. It was implemented using MapReduce programming mode and was testified on the cloud computing platfor...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper analyzes challenges of ordinary text clustering algorithms and proposes cloud computing can be a feasible solution. The classical Jarvis-Patrick (JP) algorithm was adapted as a study case. It was implemented using MapReduce programming mode and was testified on the cloud computing platform-Hadoop with Sogou corpus provided by Sogou laboratory. The experiment results demonstrate that text clustering algorithm can be paralleled in MapReduce framework and parallel algorithm can handle massive textual data and get a better time performance. |
---|---|
DOI: | 10.1109/CGC.2012.128 |