Parallel Collection of Live Data Using Hadoop

Hadoop is a fault tolerant Java framework that supports data distribution and process parallelization using commodity hardware. Based on the provided scalability and the independence of task execution, we combined Hadoop with crawling techniques to implement various applications that deal with large...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Talattinis, K, Sidiropoulou, A, Chalkias, K, Stephanides, G
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Hadoop is a fault tolerant Java framework that supports data distribution and process parallelization using commodity hardware. Based on the provided scalability and the independence of task execution, we combined Hadoop with crawling techniques to implement various applications that deal with large amount of data. Our experiments show that Hadoop is a very useful and trustworthy tool for creating distributed programs that perform better in terms of computational efficiency.
DOI:10.1109/PCI.2010.47