Measuring structural similarity among web documents: preliminary results
When we describe a Web page informally, we often use phrases like “it looks like a newspaper site”, “there are several unordered lists” or “it's just a collection of links”. Unfortunately, no Web search or classification tools provide the capability to retrieve information using such informal d...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | When we describe a Web page informally, we often use phrases like “it looks like a newspaper site”, “there are several unordered lists” or “it's just a collection of links”. Unfortunately, no Web search or classification tools provide the capability to retrieve information using such informal descriptions that are based on the appearance, i.e., structure, of the Web page. In this paper, we take a look at the concept of structurally similar Web pages. We note that some structural properties can be identified with semantic properties of the data and provide measures for comparison between HTML documents. |
---|---|
ISSN: | 0302-9743 1611-3349 |
DOI: | 10.1007/BFb0053296 |