A Novel Distributed Web Crawling Approach Based on Mongodb

The crawler is an important part of Web Search Engine, but most crawlers, as a single-point technology, were often limited by the hardware to create value. How to build an efficient, stable, and scalable distributed crawling system has become an important issue. A distributed crawling technology com...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of advancements in computing technology 2013-03, Vol.5 (6), p.794-801
Hauptverfasser: Huailin, Dong, Amei, Wu, Qingfeng, Wu, Xiaodan, Zhu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The crawler is an important part of Web Search Engine, but most crawlers, as a single-point technology, were often limited by the hardware to create value. How to build an efficient, stable, and scalable distributed crawling system has become an important issue. A distributed crawling technology combined Mongodb is described in this paper and it is also showed that how the technology is used in practice to build a small-scale distributed system. Finally, this technology is more efficient, stable and easier to expand than others as compared with single-point crawling technology and similar products of NoSQL.
ISSN:2005-8039
2233-9337
DOI:10.4156/ijact.vol5.issue6.93