An optimized crawling technique for maintaining fresh repositories

With the rapid increase in demand of digital information via internet, it becomes imperative for search engines to serve up to date information in response to user query. A web crawler plays a vital role in maintaining local cache of search engine. Today, the biggest challenge for web crawler is how...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Multimedia tools and applications 2021-03, Vol.80 (7), p.11049-11077
1. Verfasser: Sethi, Shilpa
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:With the rapid increase in demand of digital information via internet, it becomes imperative for search engines to serve up to date information in response to user query. A web crawler plays a vital role in maintaining local cache of search engine. Today, the biggest challenge for web crawler is how to harness fresh information in its local cache of n constantly changing web pages. These web pages often possess dynamic creation and updation cycle. Moreover, the resources for downloading possible updates are limited. This problem is formulated as non-deterministic optimization problem. Many attempts had been made by researchers in past to solve it. But most of existing techniques work well for small value of n and often intractable for large corpus. The paper presents an optimal solution to deal with this non- deterministic problem for large data. The experimental results show that technique achieves promising results as compared to existing crawler.
ISSN:1380-7501
1573-7721
DOI:10.1007/s11042-020-10250-8