A Distributed Parallel Algorithm for Web Page Inverted Indexes Construction on the Cluster Computing Systems

Against the low index speed of serial algorithm for Web page inverted indexes construction, according to a characteristic of merge-sort algorithm meets the theory of scheduling divisible loads in parallel and distributed system, the paper proposed a new parallel algorithm basing on the triple sort-m...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Liang Zhengyou, Chen Tao
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Against the low index speed of serial algorithm for Web page inverted indexes construction, according to a characteristic of merge-sort algorithm meets the theory of scheduling divisible loads in parallel and distributed system, the paper proposed a new parallel algorithm basing on the triple sort-merge for Web page inverted indexes construction. The algorithm distributed parallel dealt with the two tasks parsing term and sorting these term postings which spent lots of time in the construction of inverted indexes, each term was represented as a triple, the time complexity of the algorithm was analyzed. This paper also applied a Java middleware named ProActive, designed and implemented a distributive parallel Web page indexer named P_Indexer on the cluster computing systems. The algorithm analysis and experimental results showed the parallel algorithm reaches high efficiency and good scalability.
DOI:10.1109/IFITA.2009.553