Topic-Specific Crawling on the Web with Concept Context Graph Based on FCA

Topic-specific crawling is a method which can not crawl all the Web page, but only crawls the Web Pages which are related to users' interests. The Web Pages which have high relevancy of the users' interests should be crawled first. The major problem in focused crawling is how to assign pro...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Qiangqiang Peng, Yajun Du, Yufeng Hai, Shaoming Chen, Zhaoqiong Gao
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Topic-specific crawling is a method which can not crawl all the Web page, but only crawls the Web Pages which are related to users' interests. The Web Pages which have high relevancy of the users' interests should be crawled first. The major problem in focused crawling is how to assign proper credits to the unvisited pages the crawling will visit. In this paper, we propose an effective approach using concept context graph based on formal concept analysis to solve this problem. We build a concept lattice with the visited pages, and then use a method of combination of the term to construct our concept context graph based on the upper concept lattice. Our crawler can measure a page's expected relevancy to a given topic and determine the order in which pages should be visited first. An experiment illustrates that the new method is an effective mechanism which have a considerable result.
DOI:10.1109/ICMSS.2009.5302301