Automatic rule refinement for information extraction

Rule-based information extraction from text is increasingly being used to populate databases and to support structured queries on unstructured text. Specification of suitable information extraction rules requires considerable skill and standard practice is to refine rules iteratively, with substanti...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Proceedings of the VLDB Endowment 2010-09, Vol.3 (1-2), p.588-597
Hauptverfasser:	Liu, Bin, Chiticariu, Laura, Chu, Vivian, Jagadish, H. V., Reiss, Frederick R.
Format:	Artikel
Sprache:	eng
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	597
container_issue	1-2
container_start_page	588
container_title	Proceedings of the VLDB Endowment
container_volume	3
creator	Liu, Bin Chiticariu, Laura Chu, Vivian Jagadish, H. V. Reiss, Frederick R.
description	Rule-based information extraction from text is increasingly being used to populate databases and to support structured queries on unstructured text. Specification of suitable information extraction rules requires considerable skill and standard practice is to refine rules iteratively, with substantial effort. In this paper, we show that techniques developed in the context of data provenance, to determine the lineage of a tuple in a database, can be leveraged to assist in rule refinement. Specifically, given a set of extraction rules and correct and incorrect extracted data, we have developed a technique to suggest a ranked list of rule modifications that an expert rule specifier can consider. We implemented our technique in the SystemT information extraction system developed at IBM Research -- Almaden and experimentally demonstrate its effectiveness.
doi_str_mv	10.14778/1920841.1920916
format	Article
fullrecord	<record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_14778_1920841_1920916</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_14778_1920841_1920916</sourcerecordid><originalsourceid>FETCH-LOGICAL-c243t-629f9f519383424bdb177f5b18ec382553ac47cdf7bd5332433dc13083508c913</originalsourceid><addsrcrecordid>eNpNj81qwzAQhEVpoGmae496AadarWRJxxD6Ewj00p6FLEvgEttFUqB9-9qtD73MDAy7zEfIPbAdCKX0AxjOtIDd7AbqK7LmIFmlmVHX__INuc35g7Fa16DXROwvZexd6TxNl3OgKcRuCH0YCo1jot0w6VyPAw1fJTk_xzuyiu6cw3bxDXl_enw7vFSn1-fjYX-qPBdYqpqbaKIEgxoFF03bgFJRNqCDR82lROeF8m1UTSsRpxtsPSDTKJn2BnBD2N9fn8acp2n2M3W9S98WmP2ltgu1XajxB7FOSU0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Automatic rule refinement for information extraction</title><source>ACM Digital Library Complete</source><creator>Liu, Bin ; Chiticariu, Laura ; Chu, Vivian ; Jagadish, H. V. ; Reiss, Frederick R.</creator><creatorcontrib>Liu, Bin ; Chiticariu, Laura ; Chu, Vivian ; Jagadish, H. V. ; Reiss, Frederick R.</creatorcontrib><description>Rule-based information extraction from text is increasingly being used to populate databases and to support structured queries on unstructured text. Specification of suitable information extraction rules requires considerable skill and standard practice is to refine rules iteratively, with substantial effort. In this paper, we show that techniques developed in the context of data provenance, to determine the lineage of a tuple in a database, can be leveraged to assist in rule refinement. Specifically, given a set of extraction rules and correct and incorrect extracted data, we have developed a technique to suggest a ranked list of rule modifications that an expert rule specifier can consider. We implemented our technique in the SystemT information extraction system developed at IBM Research -- Almaden and experimentally demonstrate its effectiveness.</description><identifier>ISSN: 2150-8097</identifier><identifier>EISSN: 2150-8097</identifier><identifier>DOI: 10.14778/1920841.1920916</identifier><language>eng</language><ispartof>Proceedings of the VLDB Endowment, 2010-09, Vol.3 (1-2), p.588-597</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c243t-629f9f519383424bdb177f5b18ec382553ac47cdf7bd5332433dc13083508c913</citedby><cites>FETCH-LOGICAL-c243t-629f9f519383424bdb177f5b18ec382553ac47cdf7bd5332433dc13083508c913</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Liu, Bin</creatorcontrib><creatorcontrib>Chiticariu, Laura</creatorcontrib><creatorcontrib>Chu, Vivian</creatorcontrib><creatorcontrib>Jagadish, H. V.</creatorcontrib><creatorcontrib>Reiss, Frederick R.</creatorcontrib><title>Automatic rule refinement for information extraction</title><title>Proceedings of the VLDB Endowment</title><description>Rule-based information extraction from text is increasingly being used to populate databases and to support structured queries on unstructured text. Specification of suitable information extraction rules requires considerable skill and standard practice is to refine rules iteratively, with substantial effort. In this paper, we show that techniques developed in the context of data provenance, to determine the lineage of a tuple in a database, can be leveraged to assist in rule refinement. Specifically, given a set of extraction rules and correct and incorrect extracted data, we have developed a technique to suggest a ranked list of rule modifications that an expert rule specifier can consider. We implemented our technique in the SystemT information extraction system developed at IBM Research -- Almaden and experimentally demonstrate its effectiveness.</description><issn>2150-8097</issn><issn>2150-8097</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2010</creationdate><recordtype>article</recordtype><recordid>eNpNj81qwzAQhEVpoGmae496AadarWRJxxD6Ewj00p6FLEvgEttFUqB9-9qtD73MDAy7zEfIPbAdCKX0AxjOtIDd7AbqK7LmIFmlmVHX__INuc35g7Fa16DXROwvZexd6TxNl3OgKcRuCH0YCo1jot0w6VyPAw1fJTk_xzuyiu6cw3bxDXl_enw7vFSn1-fjYX-qPBdYqpqbaKIEgxoFF03bgFJRNqCDR82lROeF8m1UTSsRpxtsPSDTKJn2BnBD2N9fn8acp2n2M3W9S98WmP2ltgu1XajxB7FOSU0</recordid><startdate>20100901</startdate><enddate>20100901</enddate><creator>Liu, Bin</creator><creator>Chiticariu, Laura</creator><creator>Chu, Vivian</creator><creator>Jagadish, H. V.</creator><creator>Reiss, Frederick R.</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20100901</creationdate><title>Automatic rule refinement for information extraction</title><author>Liu, Bin ; Chiticariu, Laura ; Chu, Vivian ; Jagadish, H. V. ; Reiss, Frederick R.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c243t-629f9f519383424bdb177f5b18ec382553ac47cdf7bd5332433dc13083508c913</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2010</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liu, Bin</creatorcontrib><creatorcontrib>Chiticariu, Laura</creatorcontrib><creatorcontrib>Chu, Vivian</creatorcontrib><creatorcontrib>Jagadish, H. V.</creatorcontrib><creatorcontrib>Reiss, Frederick R.</creatorcontrib><collection>CrossRef</collection><jtitle>Proceedings of the VLDB Endowment</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Liu, Bin</au><au>Chiticariu, Laura</au><au>Chu, Vivian</au><au>Jagadish, H. V.</au><au>Reiss, Frederick R.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Automatic rule refinement for information extraction</atitle><jtitle>Proceedings of the VLDB Endowment</jtitle><date>2010-09-01</date><risdate>2010</risdate><volume>3</volume><issue>1-2</issue><spage>588</spage><epage>597</epage><pages>588-597</pages><issn>2150-8097</issn><eissn>2150-8097</eissn><abstract>Rule-based information extraction from text is increasingly being used to populate databases and to support structured queries on unstructured text. Specification of suitable information extraction rules requires considerable skill and standard practice is to refine rules iteratively, with substantial effort. In this paper, we show that techniques developed in the context of data provenance, to determine the lineage of a tuple in a database, can be leveraged to assist in rule refinement. Specifically, given a set of extraction rules and correct and incorrect extracted data, we have developed a technique to suggest a ranked list of rule modifications that an expert rule specifier can consider. We implemented our technique in the SystemT information extraction system developed at IBM Research -- Almaden and experimentally demonstrate its effectiveness.</abstract><doi>10.14778/1920841.1920916</doi><tpages>10</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 2150-8097
ispartof	Proceedings of the VLDB Endowment, 2010-09, Vol.3 (1-2), p.588-597
issn	2150-8097 2150-8097
language	eng
recordid	cdi_crossref_primary_10_14778_1920841_1920916
source	ACM Digital Library Complete
title	Automatic rule refinement for information extraction
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T18%3A47%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Automatic%20rule%20refinement%20for%20information%20extraction&rft.jtitle=Proceedings%20of%20the%20VLDB%20Endowment&rft.au=Liu,%20Bin&rft.date=2010-09-01&rft.volume=3&rft.issue=1-2&rft.spage=588&rft.epage=597&rft.pages=588-597&rft.issn=2150-8097&rft.eissn=2150-8097&rft_id=info:doi/10.14778/1920841.1920916&rft_dat=%3Ccrossref%3E10_14778_1920841_1920916%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true