EaSd: A System for Extracting and Annotating Structured Data
Many Web pages are generated dynamically in response to an online query. Structured data are contained in those pages and will be useful for information integration. In this paper, we propose a system, EaSd, to automatically extract data records from those Web pages and annotate the record attribute...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Many Web pages are generated dynamically in response to an online query. Structured data are contained in those pages and will be useful for information integration. In this paper, we propose a system, EaSd, to automatically extract data records from those Web pages and annotate the record attributes. Using the VIPS as the data representation format of the Web pages, we deal with those two problems in a uniform process based on the query instance. For data extraction, the VIPS is a better way for Web page representation than tag-tree and makes the extraction result better correspond. EaSd annotates the record attributes with integrated interface schema and has a more consistent and complete annotation result. Also, the experimental results we got show the promise of our approach. |
---|---|
ISSN: | 2155-6083 2155-6091 |
DOI: | 10.1109/GCIS.2009.81 |