Measuring structural similarity among web documents: preliminary results

When we describe a Web page informally, we often use phrases like “it looks like a newspaper site”, “there are several unordered lists” or “it's just a collection of links”. Unfortunately, no Web search or classification tools provide the capability to retrieve information using such informal d...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Cruz, Isabel F., Borisov, Slava, Marks, Michael A., Webb, Timothy R.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:When we describe a Web page informally, we often use phrases like “it looks like a newspaper site”, “there are several unordered lists” or “it's just a collection of links”. Unfortunately, no Web search or classification tools provide the capability to retrieve information using such informal descriptions that are based on the appearance, i.e., structure, of the Web page. In this paper, we take a look at the concept of structurally similar Web pages. We note that some structural properties can be identified with semantic properties of the data and provide measures for comparison between HTML documents.
ISSN:0302-9743
1611-3349
DOI:10.1007/BFb0053296