EaSd: A System for Extracting and Annotating Structured Data

Many Web pages are generated dynamically in response to an online query. Structured data are contained in those pages and will be useful for information integration. In this paper, we propose a system, EaSd, to automatically extract data records from those Web pages and annotate the record attribute...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Huibin Zhang, Xiaojie Yuan, Zongyun Yang, Yanlong Wen
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Many Web pages are generated dynamically in response to an online query. Structured data are contained in those pages and will be useful for information integration. In this paper, we propose a system, EaSd, to automatically extract data records from those Web pages and annotate the record attributes. Using the VIPS as the data representation format of the Web pages, we deal with those two problems in a uniform process based on the query instance. For data extraction, the VIPS is a better way for Web page representation than tag-tree and makes the extraction result better correspond. EaSd annotates the record attributes with integrated interface schema and has a more consistent and complete annotation result. Also, the experimental results we got show the promise of our approach.
ISSN:2155-6083
2155-6091
DOI:10.1109/GCIS.2009.81