Gathering meta-data and instances from object referral lists on the web
Purpose - The purpose of this research is to automatically separate and extract meta-data and instance information from various link pages in the web, by utilizing presentation and linkage regularities on the web.Design methodology approach - Research objectives have been achieved through an informa...
Gespeichert in:
Veröffentlicht in: | Online information review 2006-01, Vol.30 (3), p.278-296 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 296 |
---|---|
container_issue | 3 |
container_start_page | 278 |
container_title | Online information review |
container_volume | 30 |
creator | Vadrevu, Srinivas Gelgi, Fatih Nagarajan, Saravanakumar Davulcu, Hasan |
description | Purpose - The purpose of this research is to automatically separate and extract meta-data and instance information from various link pages in the web, by utilizing presentation and linkage regularities on the web.Design methodology approach - Research objectives have been achieved through an information extraction system called semantic partitioner that automatically organizes the content in each web page into a hierarchical structure, and an algorithm that interprets and translates these hierarchical structures into logical statements by distinguishing and representing the meta-data and their individual data instances.Findings - Experimental results for the university domain with 12 computer science department web sites, comprising 361 individual faculty and course home pages indicate that the performance of the meta-data and instance extraction averages 85, 88 percent F-measure, respectively. Our METEOR system achieves this performance without any domain specific engineering requirement.Originality value - Important contributions of the METEOR system presented in this paper are: it performs extraction without the assumption that the object instance pages are template-driven; it is domain independent and does not require any previously engineered domain ontology; and by interpreting the link pages, it can extract both meta-data, such as concept and attribute names and their relationships, as well as their instances with high accuracy. |
doi_str_mv | 10.1108/14684520610675807 |
format | Article |
fullrecord | <record><control><sourceid>proquest_emera</sourceid><recordid>TN_cdi_proquest_miscellaneous_57662345</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>57662345</sourcerecordid><originalsourceid>FETCH-LOGICAL-c443t-cb9fe70aed4ac3dce8b691fd21f279807a200f546691a547da459989241b559e3</originalsourceid><addsrcrecordid>eNqFkU1LAzEQhoMoWKs_wFvw4MnVJJvsJkcpWoWCFz2HbDLRLftRkxTx35tSUagUTzMMzzsf7yB0Tsk1pUTeUF5JLhipKKlqIUl9gCabWsFFKQ5_clYfo5MYl4RQxksxQfO5SW8Q2uEV95BM4Uwy2AwOt0NMZrAQsQ9jj8dmCTbhAB5CMB3u2pgiHgec1fgDmlN05E0X4ew7TtHL_d3z7KFYPM0fZ7eLwnJepsI2ykNNDDhubOksyKZS1DtGPatVXtswQrzgVa4awWtnuFBKKsZpI4SCcoout31XYXxfQ0y6b6OFrjMDjOuoRV1VrMxH_weWgipJpfoXZJJJkb3K4MUOuBzXYcjXaqq4IIrQDUS3kA1jjNkuvQptb8KnpkRvPqX_fCprrrYa6CF7634lu6heOZ9xsgffO-ELqXCffA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>194509015</pqid></control><display><type>article</type><title>Gathering meta-data and instances from object referral lists on the web</title><source>Emerald A-Z Current Journals</source><source>Standard: Emerald eJournal Premier Collection</source><creator>Vadrevu, Srinivas ; Gelgi, Fatih ; Nagarajan, Saravanakumar ; Davulcu, Hasan</creator><contributor>Sicilia, Miguel‐Angel</contributor><creatorcontrib>Vadrevu, Srinivas ; Gelgi, Fatih ; Nagarajan, Saravanakumar ; Davulcu, Hasan ; Sicilia, Miguel‐Angel</creatorcontrib><description>Purpose - The purpose of this research is to automatically separate and extract meta-data and instance information from various link pages in the web, by utilizing presentation and linkage regularities on the web.Design methodology approach - Research objectives have been achieved through an information extraction system called semantic partitioner that automatically organizes the content in each web page into a hierarchical structure, and an algorithm that interprets and translates these hierarchical structures into logical statements by distinguishing and representing the meta-data and their individual data instances.Findings - Experimental results for the university domain with 12 computer science department web sites, comprising 361 individual faculty and course home pages indicate that the performance of the meta-data and instance extraction averages 85, 88 percent F-measure, respectively. Our METEOR system achieves this performance without any domain specific engineering requirement.Originality value - Important contributions of the METEOR system presented in this paper are: it performs extraction without the assumption that the object instance pages are template-driven; it is domain independent and does not require any previously engineered domain ontology; and by interpreting the link pages, it can extract both meta-data, such as concept and attribute names and their relationships, as well as their instances with high accuracy.</description><identifier>ISSN: 1468-4527</identifier><identifier>EISSN: 1468-4535</identifier><identifier>DOI: 10.1108/14684520610675807</identifier><language>eng</language><publisher>Bradford: Emerald Group Publishing Limited</publisher><subject>Algorithms ; Arizona ; Arizona University ; Computer science ; Directories ; Information Retrieval ; Information Services ; Information Sources ; Mathematics ; Metadata ; METEOR ; Online information retrieval ; Referral ; Searching ; Semantics ; Semiotics ; Universities ; USA ; Web sites ; World Wide Web</subject><ispartof>Online information review, 2006-01, Vol.30 (3), p.278-296</ispartof><rights>Emerald Group Publishing Limited</rights><rights>Copyright Emerald Group Publishing Limited 2006</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c443t-cb9fe70aed4ac3dce8b691fd21f279807a200f546691a547da459989241b559e3</citedby><cites>FETCH-LOGICAL-c443t-cb9fe70aed4ac3dce8b691fd21f279807a200f546691a547da459989241b559e3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.emerald.com/insight/content/doi/10.1108/14684520610675807/full/pdf$$EPDF$$P50$$Gemerald$$H</linktopdf><linktohtml>$$Uhttps://www.emerald.com/insight/content/doi/10.1108/14684520610675807/full/html$$EHTML$$P50$$Gemerald$$H</linktohtml><link.rule.ids>314,780,784,967,11635,21695,27924,27925,52686,52689,53244,53372</link.rule.ids></links><search><contributor>Sicilia, Miguel‐Angel</contributor><creatorcontrib>Vadrevu, Srinivas</creatorcontrib><creatorcontrib>Gelgi, Fatih</creatorcontrib><creatorcontrib>Nagarajan, Saravanakumar</creatorcontrib><creatorcontrib>Davulcu, Hasan</creatorcontrib><title>Gathering meta-data and instances from object referral lists on the web</title><title>Online information review</title><description>Purpose - The purpose of this research is to automatically separate and extract meta-data and instance information from various link pages in the web, by utilizing presentation and linkage regularities on the web.Design methodology approach - Research objectives have been achieved through an information extraction system called semantic partitioner that automatically organizes the content in each web page into a hierarchical structure, and an algorithm that interprets and translates these hierarchical structures into logical statements by distinguishing and representing the meta-data and their individual data instances.Findings - Experimental results for the university domain with 12 computer science department web sites, comprising 361 individual faculty and course home pages indicate that the performance of the meta-data and instance extraction averages 85, 88 percent F-measure, respectively. Our METEOR system achieves this performance without any domain specific engineering requirement.Originality value - Important contributions of the METEOR system presented in this paper are: it performs extraction without the assumption that the object instance pages are template-driven; it is domain independent and does not require any previously engineered domain ontology; and by interpreting the link pages, it can extract both meta-data, such as concept and attribute names and their relationships, as well as their instances with high accuracy.</description><subject>Algorithms</subject><subject>Arizona</subject><subject>Arizona University</subject><subject>Computer science</subject><subject>Directories</subject><subject>Information Retrieval</subject><subject>Information Services</subject><subject>Information Sources</subject><subject>Mathematics</subject><subject>Metadata</subject><subject>METEOR</subject><subject>Online information retrieval</subject><subject>Referral</subject><subject>Searching</subject><subject>Semantics</subject><subject>Semiotics</subject><subject>Universities</subject><subject>USA</subject><subject>Web sites</subject><subject>World Wide Web</subject><issn>1468-4527</issn><issn>1468-4535</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2006</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNqFkU1LAzEQhoMoWKs_wFvw4MnVJJvsJkcpWoWCFz2HbDLRLftRkxTx35tSUagUTzMMzzsf7yB0Tsk1pUTeUF5JLhipKKlqIUl9gCabWsFFKQ5_clYfo5MYl4RQxksxQfO5SW8Q2uEV95BM4Uwy2AwOt0NMZrAQsQ9jj8dmCTbhAB5CMB3u2pgiHgec1fgDmlN05E0X4ew7TtHL_d3z7KFYPM0fZ7eLwnJepsI2ykNNDDhubOksyKZS1DtGPatVXtswQrzgVa4awWtnuFBKKsZpI4SCcoout31XYXxfQ0y6b6OFrjMDjOuoRV1VrMxH_weWgipJpfoXZJJJkb3K4MUOuBzXYcjXaqq4IIrQDUS3kA1jjNkuvQptb8KnpkRvPqX_fCprrrYa6CF7634lu6heOZ9xsgffO-ELqXCffA</recordid><startdate>20060101</startdate><enddate>20060101</enddate><creator>Vadrevu, Srinivas</creator><creator>Gelgi, Fatih</creator><creator>Nagarajan, Saravanakumar</creator><creator>Davulcu, Hasan</creator><general>Emerald Group Publishing Limited</general><scope>AAYXX</scope><scope>CITATION</scope><scope>0-V</scope><scope>0U~</scope><scope>1-H</scope><scope>7RV</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FI</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ALSLI</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>CJNVE</scope><scope>CNYFK</scope><scope>DWQXO</scope><scope>E3H</scope><scope>F2A</scope><scope>FYUFA</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L.0</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>M0P</scope><scope>M1O</scope><scope>M2O</scope><scope>MBDVC</scope><scope>NAPCQ</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQEDU</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope><scope>7TA</scope><scope>JG9</scope></search><sort><creationdate>20060101</creationdate><title>Gathering meta-data and instances from object referral lists on the web</title><author>Vadrevu, Srinivas ; Gelgi, Fatih ; Nagarajan, Saravanakumar ; Davulcu, Hasan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c443t-cb9fe70aed4ac3dce8b691fd21f279807a200f546691a547da459989241b559e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2006</creationdate><topic>Algorithms</topic><topic>Arizona</topic><topic>Arizona University</topic><topic>Computer science</topic><topic>Directories</topic><topic>Information Retrieval</topic><topic>Information Services</topic><topic>Information Sources</topic><topic>Mathematics</topic><topic>Metadata</topic><topic>METEOR</topic><topic>Online information retrieval</topic><topic>Referral</topic><topic>Searching</topic><topic>Semantics</topic><topic>Semiotics</topic><topic>Universities</topic><topic>USA</topic><topic>Web sites</topic><topic>World Wide Web</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Vadrevu, Srinivas</creatorcontrib><creatorcontrib>Gelgi, Fatih</creatorcontrib><creatorcontrib>Nagarajan, Saravanakumar</creatorcontrib><creatorcontrib>Davulcu, Hasan</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Social Sciences Premium Collection</collection><collection>Global News & ABI/Inform Professional</collection><collection>Trade PRO</collection><collection>Nursing & Allied Health Database</collection><collection>Computer and Information Systems Abstracts</collection><collection>Access via ABI/INFORM (ProQuest)</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Hospital Premium Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Social Science Premium Collection</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>Education Collection</collection><collection>Library & Information Science Collection</collection><collection>ProQuest Central Korea</collection><collection>Library & Information Sciences Abstracts (LISA)</collection><collection>Library & Information Science Abstracts (LISA)</collection><collection>Health Research Premium Collection</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>ABI/INFORM Professional Standard</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Education Database</collection><collection>Library Science Database</collection><collection>Research Library</collection><collection>Research Library (Corporate)</collection><collection>Nursing & Allied Health Premium</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Business</collection><collection>ProQuest One Education</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><collection>Materials Business File</collection><collection>Materials Research Database</collection><jtitle>Online information review</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Vadrevu, Srinivas</au><au>Gelgi, Fatih</au><au>Nagarajan, Saravanakumar</au><au>Davulcu, Hasan</au><au>Sicilia, Miguel‐Angel</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Gathering meta-data and instances from object referral lists on the web</atitle><jtitle>Online information review</jtitle><date>2006-01-01</date><risdate>2006</risdate><volume>30</volume><issue>3</issue><spage>278</spage><epage>296</epage><pages>278-296</pages><issn>1468-4527</issn><eissn>1468-4535</eissn><abstract>Purpose - The purpose of this research is to automatically separate and extract meta-data and instance information from various link pages in the web, by utilizing presentation and linkage regularities on the web.Design methodology approach - Research objectives have been achieved through an information extraction system called semantic partitioner that automatically organizes the content in each web page into a hierarchical structure, and an algorithm that interprets and translates these hierarchical structures into logical statements by distinguishing and representing the meta-data and their individual data instances.Findings - Experimental results for the university domain with 12 computer science department web sites, comprising 361 individual faculty and course home pages indicate that the performance of the meta-data and instance extraction averages 85, 88 percent F-measure, respectively. Our METEOR system achieves this performance without any domain specific engineering requirement.Originality value - Important contributions of the METEOR system presented in this paper are: it performs extraction without the assumption that the object instance pages are template-driven; it is domain independent and does not require any previously engineered domain ontology; and by interpreting the link pages, it can extract both meta-data, such as concept and attribute names and their relationships, as well as their instances with high accuracy.</abstract><cop>Bradford</cop><pub>Emerald Group Publishing Limited</pub><doi>10.1108/14684520610675807</doi><tpages>19</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1468-4527 |
ispartof | Online information review, 2006-01, Vol.30 (3), p.278-296 |
issn | 1468-4527 1468-4535 |
language | eng |
recordid | cdi_proquest_miscellaneous_57662345 |
source | Emerald A-Z Current Journals; Standard: Emerald eJournal Premier Collection |
subjects | Algorithms Arizona Arizona University Computer science Directories Information Retrieval Information Services Information Sources Mathematics Metadata METEOR Online information retrieval Referral Searching Semantics Semiotics Universities USA Web sites World Wide Web |
title | Gathering meta-data and instances from object referral lists on the web |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-21T14%3A10%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_emera&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Gathering%20meta-data%20and%20instances%20from%20object%20referral%20lists%20on%20the%20web&rft.jtitle=Online%20information%20review&rft.au=Vadrevu,%20Srinivas&rft.date=2006-01-01&rft.volume=30&rft.issue=3&rft.spage=278&rft.epage=296&rft.pages=278-296&rft.issn=1468-4527&rft.eissn=1468-4535&rft_id=info:doi/10.1108/14684520610675807&rft_dat=%3Cproquest_emera%3E57662345%3C/proquest_emera%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=194509015&rft_id=info:pmid/&rfr_iscdi=true |