Web page analysis using multiple graphs
A collection of web pages is modeled as a directed graph, in which the nodes of the graph are the web pages and directed edges are hyperlinks. Web pages can also be represented by content, or by other features, to obtain a similarity graph over the web pages, where nodes again denote the web pages a...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A collection of web pages is modeled as a directed graph, in which the nodes of the graph are the web pages and directed edges are hyperlinks. Web pages can also be represented by content, or by other features, to obtain a similarity graph over the web pages, where nodes again denote the web pages and the links or edges between each pair of nodes is weighted by a corresponding similarity between those two nodes. A random walk is defined for each graph, and a mixture of the random walks is obtained for the set of graphs. The collection of web pages is then analyzed based on the mixture to obtain a web page analysis result. The web page analysis result can be, for example, clustering of the web pages to discover web communities, classifying or categorizing the web pages, or spam detection indicating whether a given web page is spam or content. |
---|