Querying text databases for efficient information extraction

A wealth of information is hidden within unstructured text. This information is often best exploited in structured or relational form, which is suited for sophisticated query processing, for integration with relational databases, and for data mining. Current information extraction techniques extract...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Agichtein, E., Gravano, L.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Corporate acquisitions Data mining Government Humans Information filtering Information filters Information retrieval Monitoring Query processing Relational databases
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	124
container_issue
container_start_page	113
container_title
container_volume
creator	Agichtein, E. Gravano, L.
description	A wealth of information is hidden within unstructured text. This information is often best exploited in structured or relational form, which is suited for sophisticated query processing, for integration with relational databases, and for data mining. Current information extraction techniques extract relations from a text database by examining every document in the database, or use filters to select promising documents for extraction. The exhaustive scanning approach is not practical or even feasible for large databases, and the current filtering techniques require human involvement to maintain and to adapt to new databases and domains. We develop an automatic query-based technique to retrieve documents useful for the extraction of user-defined relations from large text databases, which can be adapted to new domains, databases, or target relations with minimal human effort. We report a thorough experimental evaluation over a large newspaper archive that shows that we significantly improve the efficiency of the extraction process by focusing only on promising documents.
doi_str_mv	10.1109/ICDE.2003.1260786
format	Conference Proceeding
fullrecord	<record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_1260786</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>1260786</ieee_id><sourcerecordid>1260786</sourcerecordid><originalsourceid>FETCH-LOGICAL-c264t-8dbe4787c15c4d3563f07dd8790d58f91f23c25ec175f7adeb917402f0611b623</originalsourceid><addsrcrecordid>eNotj91KAzEUhAMiKHUfQLzJC-x68p-AN7JWWyiUgl6XbHIiEbuVbAT79q7YuZlvYBgYQm4ZdIyBu1_3T8uOA4iOcQ3G6gvSOGNnAmG0VvqKNNP0AbOkEs7Ka_Kw-8ZyyuM7rfhTafTVD37CiaZjoZhSDhnHSvM454Ov-TjSuVd8-MMbcpn854TN2Rfk7Xn52q_azfZl3T9u2sC1rK2NA0pjTWAqyCiUFglMjNY4iMomxxIXgSsMzKhkfMTBMSOBJ9CMDZqLBbn7382IuP8q-eDLaX_-KH4BldpHKw</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Querying text databases for efficient information extraction</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Agichtein, E. ; Gravano, L.</creator><creatorcontrib>Agichtein, E. ; Gravano, L.</creatorcontrib><description>A wealth of information is hidden within unstructured text. This information is often best exploited in structured or relational form, which is suited for sophisticated query processing, for integration with relational databases, and for data mining. Current information extraction techniques extract relations from a text database by examining every document in the database, or use filters to select promising documents for extraction. The exhaustive scanning approach is not practical or even feasible for large databases, and the current filtering techniques require human involvement to maintain and to adapt to new databases and domains. We develop an automatic query-based technique to retrieve documents useful for the extraction of user-defined relations from large text databases, which can be adapted to new domains, databases, or target relations with minimal human effort. We report a thorough experimental evaluation over a large newspaper archive that shows that we significantly improve the efficiency of the extraction process by focusing only on promising documents.</description><identifier>ISBN: 9780780376656</identifier><identifier>ISBN: 078037665X</identifier><identifier>DOI: 10.1109/ICDE.2003.1260786</identifier><language>eng</language><publisher>IEEE</publisher><subject>Corporate acquisitions ; Data mining ; Government ; Humans ; Information filtering ; Information filters ; Information retrieval ; Monitoring ; Query processing ; Relational databases</subject><ispartof>Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405), 2003, p.113-124</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c264t-8dbe4787c15c4d3563f07dd8790d58f91f23c25ec175f7adeb917402f0611b623</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/1260786$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2058,4050,4051,27925,54920</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/1260786$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Agichtein, E.</creatorcontrib><creatorcontrib>Gravano, L.</creatorcontrib><title>Querying text databases for efficient information extraction</title><title>Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405)</title><addtitle>ICDE</addtitle><description>A wealth of information is hidden within unstructured text. This information is often best exploited in structured or relational form, which is suited for sophisticated query processing, for integration with relational databases, and for data mining. Current information extraction techniques extract relations from a text database by examining every document in the database, or use filters to select promising documents for extraction. The exhaustive scanning approach is not practical or even feasible for large databases, and the current filtering techniques require human involvement to maintain and to adapt to new databases and domains. We develop an automatic query-based technique to retrieve documents useful for the extraction of user-defined relations from large text databases, which can be adapted to new domains, databases, or target relations with minimal human effort. We report a thorough experimental evaluation over a large newspaper archive that shows that we significantly improve the efficiency of the extraction process by focusing only on promising documents.</description><subject>Corporate acquisitions</subject><subject>Data mining</subject><subject>Government</subject><subject>Humans</subject><subject>Information filtering</subject><subject>Information filters</subject><subject>Information retrieval</subject><subject>Monitoring</subject><subject>Query processing</subject><subject>Relational databases</subject><isbn>9780780376656</isbn><isbn>078037665X</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2003</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotj91KAzEUhAMiKHUfQLzJC-x68p-AN7JWWyiUgl6XbHIiEbuVbAT79q7YuZlvYBgYQm4ZdIyBu1_3T8uOA4iOcQ3G6gvSOGNnAmG0VvqKNNP0AbOkEs7Ka_Kw-8ZyyuM7rfhTafTVD37CiaZjoZhSDhnHSvM454Ov-TjSuVd8-MMbcpn854TN2Rfk7Xn52q_azfZl3T9u2sC1rK2NA0pjTWAqyCiUFglMjNY4iMomxxIXgSsMzKhkfMTBMSOBJ9CMDZqLBbn7382IuP8q-eDLaX_-KH4BldpHKw</recordid><startdate>2003</startdate><enddate>2003</enddate><creator>Agichtein, E.</creator><creator>Gravano, L.</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>2003</creationdate><title>Querying text databases for efficient information extraction</title><author>Agichtein, E. ; Gravano, L.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c264t-8dbe4787c15c4d3563f07dd8790d58f91f23c25ec175f7adeb917402f0611b623</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2003</creationdate><topic>Corporate acquisitions</topic><topic>Data mining</topic><topic>Government</topic><topic>Humans</topic><topic>Information filtering</topic><topic>Information filters</topic><topic>Information retrieval</topic><topic>Monitoring</topic><topic>Query processing</topic><topic>Relational databases</topic><toplevel>online_resources</toplevel><creatorcontrib>Agichtein, E.</creatorcontrib><creatorcontrib>Gravano, L.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Agichtein, E.</au><au>Gravano, L.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Querying text databases for efficient information extraction</atitle><btitle>Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405)</btitle><stitle>ICDE</stitle><date>2003</date><risdate>2003</risdate><spage>113</spage><epage>124</epage><pages>113-124</pages><isbn>9780780376656</isbn><isbn>078037665X</isbn><abstract>A wealth of information is hidden within unstructured text. This information is often best exploited in structured or relational form, which is suited for sophisticated query processing, for integration with relational databases, and for data mining. Current information extraction techniques extract relations from a text database by examining every document in the database, or use filters to select promising documents for extraction. The exhaustive scanning approach is not practical or even feasible for large databases, and the current filtering techniques require human involvement to maintain and to adapt to new databases and domains. We develop an automatic query-based technique to retrieve documents useful for the extraction of user-defined relations from large text databases, which can be adapted to new domains, databases, or target relations with minimal human effort. We report a thorough experimental evaluation over a large newspaper archive that shows that we significantly improve the efficiency of the extraction process by focusing only on promising documents.</abstract><pub>IEEE</pub><doi>10.1109/ICDE.2003.1260786</doi><tpages>12</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISBN: 9780780376656
ispartof	Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405), 2003, p.113-124
issn
language	eng
recordid	cdi_ieee_primary_1260786
source	IEEE Electronic Library (IEL) Conference Proceedings
subjects	Corporate acquisitions Data mining Government Humans Information filtering Information filters Information retrieval Monitoring Query processing Relational databases
title	Querying text databases for efficient information extraction
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T14%3A40%3A28IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Querying%20text%20databases%20for%20efficient%20information%20extraction&rft.btitle=Proceedings%2019th%20International%20Conference%20on%20Data%20Engineering%20(Cat.%20No.03CH37405)&rft.au=Agichtein,%20E.&rft.date=2003&rft.spage=113&rft.epage=124&rft.pages=113-124&rft.isbn=9780780376656&rft.isbn_list=078037665X&rft_id=info:doi/10.1109/ICDE.2003.1260786&rft_dat=%3Cieee_6IE%3E1260786%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=1260786&rfr_iscdi=true