Gathering meta-data and instances from object referral lists on the web

Purpose - The purpose of this research is to automatically separate and extract meta-data and instance information from various link pages in the web, by utilizing presentation and linkage regularities on the web.Design methodology approach - Research objectives have been achieved through an informa...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Online information review 2006-01, Vol.30 (3), p.278-296
Hauptverfasser: Vadrevu, Srinivas, Gelgi, Fatih, Nagarajan, Saravanakumar, Davulcu, Hasan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Purpose - The purpose of this research is to automatically separate and extract meta-data and instance information from various link pages in the web, by utilizing presentation and linkage regularities on the web.Design methodology approach - Research objectives have been achieved through an information extraction system called semantic partitioner that automatically organizes the content in each web page into a hierarchical structure, and an algorithm that interprets and translates these hierarchical structures into logical statements by distinguishing and representing the meta-data and their individual data instances.Findings - Experimental results for the university domain with 12 computer science department web sites, comprising 361 individual faculty and course home pages indicate that the performance of the meta-data and instance extraction averages 85, 88 percent F-measure, respectively. Our METEOR system achieves this performance without any domain specific engineering requirement.Originality value - Important contributions of the METEOR system presented in this paper are: it performs extraction without the assumption that the object instance pages are template-driven; it is domain independent and does not require any previously engineered domain ontology; and by interpreting the link pages, it can extract both meta-data, such as concept and attribute names and their relationships, as well as their instances with high accuracy.
ISSN:1468-4527
1468-4535
DOI:10.1108/14684520610675807