Gathering meta-data and instances from object referral lists on the web

Purpose - The purpose of this research is to automatically separate and extract meta-data and instance information from various link pages in the web, by utilizing presentation and linkage regularities on the web.Design methodology approach - Research objectives have been achieved through an informa...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Online information review 2006-01, Vol.30 (3), p.278-296
Hauptverfasser:	Vadrevu, Srinivas, Gelgi, Fatih, Nagarajan, Saravanakumar, Davulcu, Hasan
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Arizona Arizona University Computer science Directories Information Retrieval Information Services Information Sources Mathematics Metadata METEOR Online information retrieval Referral Searching Semantics Semiotics Universities USA Web sites World Wide Web
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Purpose - The purpose of this research is to automatically separate and extract meta-data and instance information from various link pages in the web, by utilizing presentation and linkage regularities on the web.Design methodology approach - Research objectives have been achieved through an information extraction system called semantic partitioner that automatically organizes the content in each web page into a hierarchical structure, and an algorithm that interprets and translates these hierarchical structures into logical statements by distinguishing and representing the meta-data and their individual data instances.Findings - Experimental results for the university domain with 12 computer science department web sites, comprising 361 individual faculty and course home pages indicate that the performance of the meta-data and instance extraction averages 85, 88 percent F-measure, respectively. Our METEOR system achieves this performance without any domain specific engineering requirement.Originality value - Important contributions of the METEOR system presented in this paper are: it performs extraction without the assumption that the object instance pages are template-driven; it is domain independent and does not require any previously engineered domain ontology; and by interpreting the link pages, it can extract both meta-data, such as concept and attribute names and their relationships, as well as their instances with high accuracy.
ISSN:	1468-4527 1468-4535
DOI:	10.1108/14684520610675807