Topic-Specific Crawling on the Web with Concept Context Graph Based on FCA
Topic-specific crawling is a method which can not crawl all the Web page, but only crawls the Web Pages which are related to users' interests. The Web Pages which have high relevancy of the users' interests should be crawled first. The major problem in focused crawling is how to assign pro...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Topic-specific crawling is a method which can not crawl all the Web page, but only crawls the Web Pages which are related to users' interests. The Web Pages which have high relevancy of the users' interests should be crawled first. The major problem in focused crawling is how to assign proper credits to the unvisited pages the crawling will visit. In this paper, we propose an effective approach using concept context graph based on formal concept analysis to solve this problem. We build a concept lattice with the visited pages, and then use a method of combination of the term to construct our concept context graph based on the upper concept lattice. Our crawler can measure a page's expected relevancy to a given topic and determine the order in which pages should be visited first. An experiment illustrates that the new method is an effective mechanism which have a considerable result. |
---|---|
DOI: | 10.1109/ICMSS.2009.5302301 |