Gathering meta-data and instances from object referral lists on the web

Purpose - The purpose of this research is to automatically separate and extract meta-data and instance information from various link pages in the web, by utilizing presentation and linkage regularities on the web.Design methodology approach - Research objectives have been achieved through an informa...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Online information review 2006-01, Vol.30 (3), p.278-296
Hauptverfasser: Vadrevu, Srinivas, Gelgi, Fatih, Nagarajan, Saravanakumar, Davulcu, Hasan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 296
container_issue 3
container_start_page 278
container_title Online information review
container_volume 30
creator Vadrevu, Srinivas
Gelgi, Fatih
Nagarajan, Saravanakumar
Davulcu, Hasan
description Purpose - The purpose of this research is to automatically separate and extract meta-data and instance information from various link pages in the web, by utilizing presentation and linkage regularities on the web.Design methodology approach - Research objectives have been achieved through an information extraction system called semantic partitioner that automatically organizes the content in each web page into a hierarchical structure, and an algorithm that interprets and translates these hierarchical structures into logical statements by distinguishing and representing the meta-data and their individual data instances.Findings - Experimental results for the university domain with 12 computer science department web sites, comprising 361 individual faculty and course home pages indicate that the performance of the meta-data and instance extraction averages 85, 88 percent F-measure, respectively. Our METEOR system achieves this performance without any domain specific engineering requirement.Originality value - Important contributions of the METEOR system presented in this paper are: it performs extraction without the assumption that the object instance pages are template-driven; it is domain independent and does not require any previously engineered domain ontology; and by interpreting the link pages, it can extract both meta-data, such as concept and attribute names and their relationships, as well as their instances with high accuracy.
doi_str_mv 10.1108/14684520610675807
format Article
fullrecord <record><control><sourceid>proquest_emera</sourceid><recordid>TN_cdi_proquest_miscellaneous_57662345</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>57662345</sourcerecordid><originalsourceid>FETCH-LOGICAL-c443t-cb9fe70aed4ac3dce8b691fd21f279807a200f546691a547da459989241b559e3</originalsourceid><addsrcrecordid>eNqFkU1LAzEQhoMoWKs_wFvw4MnVJJvsJkcpWoWCFz2HbDLRLftRkxTx35tSUagUTzMMzzsf7yB0Tsk1pUTeUF5JLhipKKlqIUl9gCabWsFFKQ5_clYfo5MYl4RQxksxQfO5SW8Q2uEV95BM4Uwy2AwOt0NMZrAQsQ9jj8dmCTbhAB5CMB3u2pgiHgec1fgDmlN05E0X4ew7TtHL_d3z7KFYPM0fZ7eLwnJepsI2ykNNDDhubOksyKZS1DtGPatVXtswQrzgVa4awWtnuFBKKsZpI4SCcoout31XYXxfQ0y6b6OFrjMDjOuoRV1VrMxH_weWgipJpfoXZJJJkb3K4MUOuBzXYcjXaqq4IIrQDUS3kA1jjNkuvQptb8KnpkRvPqX_fCprrrYa6CF7634lu6heOZ9xsgffO-ELqXCffA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>194509015</pqid></control><display><type>article</type><title>Gathering meta-data and instances from object referral lists on the web</title><source>Emerald A-Z Current Journals</source><source>Standard: Emerald eJournal Premier Collection</source><creator>Vadrevu, Srinivas ; Gelgi, Fatih ; Nagarajan, Saravanakumar ; Davulcu, Hasan</creator><contributor>Sicilia, Miguel‐Angel</contributor><creatorcontrib>Vadrevu, Srinivas ; Gelgi, Fatih ; Nagarajan, Saravanakumar ; Davulcu, Hasan ; Sicilia, Miguel‐Angel</creatorcontrib><description>Purpose - The purpose of this research is to automatically separate and extract meta-data and instance information from various link pages in the web, by utilizing presentation and linkage regularities on the web.Design methodology approach - Research objectives have been achieved through an information extraction system called semantic partitioner that automatically organizes the content in each web page into a hierarchical structure, and an algorithm that interprets and translates these hierarchical structures into logical statements by distinguishing and representing the meta-data and their individual data instances.Findings - Experimental results for the university domain with 12 computer science department web sites, comprising 361 individual faculty and course home pages indicate that the performance of the meta-data and instance extraction averages 85, 88 percent F-measure, respectively. Our METEOR system achieves this performance without any domain specific engineering requirement.Originality value - Important contributions of the METEOR system presented in this paper are: it performs extraction without the assumption that the object instance pages are template-driven; it is domain independent and does not require any previously engineered domain ontology; and by interpreting the link pages, it can extract both meta-data, such as concept and attribute names and their relationships, as well as their instances with high accuracy.</description><identifier>ISSN: 1468-4527</identifier><identifier>EISSN: 1468-4535</identifier><identifier>DOI: 10.1108/14684520610675807</identifier><language>eng</language><publisher>Bradford: Emerald Group Publishing Limited</publisher><subject>Algorithms ; Arizona ; Arizona University ; Computer science ; Directories ; Information Retrieval ; Information Services ; Information Sources ; Mathematics ; Metadata ; METEOR ; Online information retrieval ; Referral ; Searching ; Semantics ; Semiotics ; Universities ; USA ; Web sites ; World Wide Web</subject><ispartof>Online information review, 2006-01, Vol.30 (3), p.278-296</ispartof><rights>Emerald Group Publishing Limited</rights><rights>Copyright Emerald Group Publishing Limited 2006</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c443t-cb9fe70aed4ac3dce8b691fd21f279807a200f546691a547da459989241b559e3</citedby><cites>FETCH-LOGICAL-c443t-cb9fe70aed4ac3dce8b691fd21f279807a200f546691a547da459989241b559e3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.emerald.com/insight/content/doi/10.1108/14684520610675807/full/pdf$$EPDF$$P50$$Gemerald$$H</linktopdf><linktohtml>$$Uhttps://www.emerald.com/insight/content/doi/10.1108/14684520610675807/full/html$$EHTML$$P50$$Gemerald$$H</linktohtml><link.rule.ids>314,780,784,967,11635,21695,27924,27925,52686,52689,53244,53372</link.rule.ids></links><search><contributor>Sicilia, Miguel‐Angel</contributor><creatorcontrib>Vadrevu, Srinivas</creatorcontrib><creatorcontrib>Gelgi, Fatih</creatorcontrib><creatorcontrib>Nagarajan, Saravanakumar</creatorcontrib><creatorcontrib>Davulcu, Hasan</creatorcontrib><title>Gathering meta-data and instances from object referral lists on the web</title><title>Online information review</title><description>Purpose - The purpose of this research is to automatically separate and extract meta-data and instance information from various link pages in the web, by utilizing presentation and linkage regularities on the web.Design methodology approach - Research objectives have been achieved through an information extraction system called semantic partitioner that automatically organizes the content in each web page into a hierarchical structure, and an algorithm that interprets and translates these hierarchical structures into logical statements by distinguishing and representing the meta-data and their individual data instances.Findings - Experimental results for the university domain with 12 computer science department web sites, comprising 361 individual faculty and course home pages indicate that the performance of the meta-data and instance extraction averages 85, 88 percent F-measure, respectively. Our METEOR system achieves this performance without any domain specific engineering requirement.Originality value - Important contributions of the METEOR system presented in this paper are: it performs extraction without the assumption that the object instance pages are template-driven; it is domain independent and does not require any previously engineered domain ontology; and by interpreting the link pages, it can extract both meta-data, such as concept and attribute names and their relationships, as well as their instances with high accuracy.</description><subject>Algorithms</subject><subject>Arizona</subject><subject>Arizona University</subject><subject>Computer science</subject><subject>Directories</subject><subject>Information Retrieval</subject><subject>Information Services</subject><subject>Information Sources</subject><subject>Mathematics</subject><subject>Metadata</subject><subject>METEOR</subject><subject>Online information retrieval</subject><subject>Referral</subject><subject>Searching</subject><subject>Semantics</subject><subject>Semiotics</subject><subject>Universities</subject><subject>USA</subject><subject>Web sites</subject><subject>World Wide Web</subject><issn>1468-4527</issn><issn>1468-4535</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2006</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNqFkU1LAzEQhoMoWKs_wFvw4MnVJJvsJkcpWoWCFz2HbDLRLftRkxTx35tSUagUTzMMzzsf7yB0Tsk1pUTeUF5JLhipKKlqIUl9gCabWsFFKQ5_clYfo5MYl4RQxksxQfO5SW8Q2uEV95BM4Uwy2AwOt0NMZrAQsQ9jj8dmCTbhAB5CMB3u2pgiHgec1fgDmlN05E0X4ew7TtHL_d3z7KFYPM0fZ7eLwnJepsI2ykNNDDhubOksyKZS1DtGPatVXtswQrzgVa4awWtnuFBKKsZpI4SCcoout31XYXxfQ0y6b6OFrjMDjOuoRV1VrMxH_weWgipJpfoXZJJJkb3K4MUOuBzXYcjXaqq4IIrQDUS3kA1jjNkuvQptb8KnpkRvPqX_fCprrrYa6CF7634lu6heOZ9xsgffO-ELqXCffA</recordid><startdate>20060101</startdate><enddate>20060101</enddate><creator>Vadrevu, Srinivas</creator><creator>Gelgi, Fatih</creator><creator>Nagarajan, Saravanakumar</creator><creator>Davulcu, Hasan</creator><general>Emerald Group Publishing Limited</general><scope>AAYXX</scope><scope>CITATION</scope><scope>0-V</scope><scope>0U~</scope><scope>1-H</scope><scope>7RV</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FI</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ALSLI</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>CJNVE</scope><scope>CNYFK</scope><scope>DWQXO</scope><scope>E3H</scope><scope>F2A</scope><scope>FYUFA</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L.0</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>M0P</scope><scope>M1O</scope><scope>M2O</scope><scope>MBDVC</scope><scope>NAPCQ</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQEDU</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope><scope>7TA</scope><scope>JG9</scope></search><sort><creationdate>20060101</creationdate><title>Gathering meta-data and instances from object referral lists on the web</title><author>Vadrevu, Srinivas ; Gelgi, Fatih ; Nagarajan, Saravanakumar ; Davulcu, Hasan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c443t-cb9fe70aed4ac3dce8b691fd21f279807a200f546691a547da459989241b559e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2006</creationdate><topic>Algorithms</topic><topic>Arizona</topic><topic>Arizona University</topic><topic>Computer science</topic><topic>Directories</topic><topic>Information Retrieval</topic><topic>Information Services</topic><topic>Information Sources</topic><topic>Mathematics</topic><topic>Metadata</topic><topic>METEOR</topic><topic>Online information retrieval</topic><topic>Referral</topic><topic>Searching</topic><topic>Semantics</topic><topic>Semiotics</topic><topic>Universities</topic><topic>USA</topic><topic>Web sites</topic><topic>World Wide Web</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Vadrevu, Srinivas</creatorcontrib><creatorcontrib>Gelgi, Fatih</creatorcontrib><creatorcontrib>Nagarajan, Saravanakumar</creatorcontrib><creatorcontrib>Davulcu, Hasan</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Social Sciences Premium Collection</collection><collection>Global News &amp; ABI/Inform Professional</collection><collection>Trade PRO</collection><collection>Nursing &amp; Allied Health Database</collection><collection>Computer and Information Systems Abstracts</collection><collection>Access via ABI/INFORM (ProQuest)</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Hospital Premium Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Social Science Premium Collection</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>Education Collection</collection><collection>Library &amp; Information Science Collection</collection><collection>ProQuest Central Korea</collection><collection>Library &amp; Information Sciences Abstracts (LISA)</collection><collection>Library &amp; Information Science Abstracts (LISA)</collection><collection>Health Research Premium Collection</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>ABI/INFORM Professional Standard</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Education Database</collection><collection>Library Science Database</collection><collection>Research Library</collection><collection>Research Library (Corporate)</collection><collection>Nursing &amp; Allied Health Premium</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest One Business</collection><collection>ProQuest One Education</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><collection>Materials Business File</collection><collection>Materials Research Database</collection><jtitle>Online information review</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Vadrevu, Srinivas</au><au>Gelgi, Fatih</au><au>Nagarajan, Saravanakumar</au><au>Davulcu, Hasan</au><au>Sicilia, Miguel‐Angel</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Gathering meta-data and instances from object referral lists on the web</atitle><jtitle>Online information review</jtitle><date>2006-01-01</date><risdate>2006</risdate><volume>30</volume><issue>3</issue><spage>278</spage><epage>296</epage><pages>278-296</pages><issn>1468-4527</issn><eissn>1468-4535</eissn><abstract>Purpose - The purpose of this research is to automatically separate and extract meta-data and instance information from various link pages in the web, by utilizing presentation and linkage regularities on the web.Design methodology approach - Research objectives have been achieved through an information extraction system called semantic partitioner that automatically organizes the content in each web page into a hierarchical structure, and an algorithm that interprets and translates these hierarchical structures into logical statements by distinguishing and representing the meta-data and their individual data instances.Findings - Experimental results for the university domain with 12 computer science department web sites, comprising 361 individual faculty and course home pages indicate that the performance of the meta-data and instance extraction averages 85, 88 percent F-measure, respectively. Our METEOR system achieves this performance without any domain specific engineering requirement.Originality value - Important contributions of the METEOR system presented in this paper are: it performs extraction without the assumption that the object instance pages are template-driven; it is domain independent and does not require any previously engineered domain ontology; and by interpreting the link pages, it can extract both meta-data, such as concept and attribute names and their relationships, as well as their instances with high accuracy.</abstract><cop>Bradford</cop><pub>Emerald Group Publishing Limited</pub><doi>10.1108/14684520610675807</doi><tpages>19</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1468-4527
ispartof Online information review, 2006-01, Vol.30 (3), p.278-296
issn 1468-4527
1468-4535
language eng
recordid cdi_proquest_miscellaneous_57662345
source Emerald A-Z Current Journals; Standard: Emerald eJournal Premier Collection
subjects Algorithms
Arizona
Arizona University
Computer science
Directories
Information Retrieval
Information Services
Information Sources
Mathematics
Metadata
METEOR
Online information retrieval
Referral
Searching
Semantics
Semiotics
Universities
USA
Web sites
World Wide Web
title Gathering meta-data and instances from object referral lists on the web
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-21T14%3A10%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_emera&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Gathering%20meta-data%20and%20instances%20from%20object%20referral%20lists%20on%20the%20web&rft.jtitle=Online%20information%20review&rft.au=Vadrevu,%20Srinivas&rft.date=2006-01-01&rft.volume=30&rft.issue=3&rft.spage=278&rft.epage=296&rft.pages=278-296&rft.issn=1468-4527&rft.eissn=1468-4535&rft_id=info:doi/10.1108/14684520610675807&rft_dat=%3Cproquest_emera%3E57662345%3C/proquest_emera%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=194509015&rft_id=info:pmid/&rfr_iscdi=true