Semantic ranking of web pages based on formal concept analysis

► Based on users’ web logs, web page's semantic ranking is defined. ► The extension and intension similarity are defined. ► The information content similarity between two nouns is computed automatically. ► We develop the semantic similarity between two concepts in different concept lattices. A...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Journal of systems and software 2013-01, Vol.86 (1), p.187-197
Hauptverfasser: Du, YaJun, Hai, YuFeng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:► Based on users’ web logs, web page's semantic ranking is defined. ► The extension and intension similarity are defined. ► The information content similarity between two nouns is computed automatically. ► We develop the semantic similarity between two concepts in different concept lattices. A web crawler is an important research component in a search engine. In this paper, a new method for measuring the similarity of formal concept analysis (FCA) concepts and a new notion of a web page's rank are proposed that use an information content approach based on users’ web logs. First, an extension similarity and an intension similarity that analyze a user's browsing pattern and their hyperlinks are proposed. Second, the information content similarity between two nouns is computed automatically by examining their ISA and Part-Of hierarchy and using a user's web log. A method for computing the semantic similarity between two concepts in two different concept lattices (the base concept lattice and the current concept lattice) and finding the semantic ranking of web pages is proposed. Last, our experiment demonstrates that our crawler is more suitable for crawling focused web pages. It proves that the semantic ranking of web pages is useful and efficient for making a web crawler's choice of a web page for continuing work.
ISSN:0164-1212
1873-1228
DOI:10.1016/j.jss.2012.07.040