Answering Table Queries on the Web using Column Keywords

We present the design of a structured search engine which returns a multi-column table in response to a query consisting of keywords describing each of its columns. We answer such queries by exploiting the millions of tables on the Web because these are much richer sources of structured knowledge th...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2012-06
Hauptverfasser:	Pimplikar, Rakesh, Sarawagi, Sunita
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Columns (structural) Computer Science - Databases Graph matching Graph theory Keywords Knowledge base Mapping Model matching Queries Redundancy Search engines Segmentation Websites Workload
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Pimplikar, Rakesh Sarawagi, Sunita
description	We present the design of a structured search engine which returns a multi-column table in response to a query consisting of keywords describing each of its columns. We answer such queries by exploiting the millions of tables on the Web because these are much richer sources of structured knowledge than free-format text. However, a corpus of tables harvested from arbitrary HTML web pages presents huge challenges of diversity and redundancy not seen in centrally edited knowledge bases. We concentrate on one concrete task in this paper. Given a set of Web tables T1, . . ., Tn, and a query Q with q sets of keywords Q1, . . ., Qq, decide for each Ti if it is relevant to Q and if so, identify the mapping between the columns of Ti and query columns. We represent this task as a graphical model that jointly maps all tables by incorporating diverse sources of clues spanning matches in different parts of the table, corpus-wide co-occurrence statistics, and content overlap across table columns. We define a novel query segmentation model for matching keywords to table columns, and a robust mechanism of exploiting content overlap across table columns. We design efficient inference algorithms based on bipartite matching and constrained graph cuts to solve the joint labeling task. Experiments on a workload of 59 queries over a 25 million web table corpus shows significant boost in accuracy over baseline IR methods.
doi_str_mv	10.48550/arxiv.1207.0132
format	Article
fullrecord	<record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_1207_0132</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2076252418</sourcerecordid><originalsourceid>FETCH-LOGICAL-a518-e48d7eff41bd581821b9c442d88eafa4559e39bb23e45fc3f8bbbdbb853420a93</originalsourceid><addsrcrecordid>eNotj99LwzAUhYMgOObefZKAz63JTbLePo7hj-FAhIKPJVlvtKNrZ7I699_bOp8uh_txOB9jN1KkGo0R9zb81N-pBJGlQiq4YBNQSiaoAa7YLMatEALmGRijJgwXbTxSqNsPXljXEH_rh0SRdy0_fBJ_J8f7OL6XXdPvWv5Cp2MXqnjNLr1tIs3-75QVjw_F8jlZvz6tlot1Yo3EhDRWGXmvpasMSgTp8o3WUCGS9VYbk5PKnQNF2viN8uicq5xDozQIm6spuz3X_lmV-1DvbDiVo1052g3A3RnYh-6rp3got10f2mFSOTBzMKAlql8tc1GF</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2076252418</pqid></control><display><type>article</type><title>Answering Table Queries on the Web using Column Keywords</title><source>arXiv.org</source><source>Free E- Journals</source><creator>Pimplikar, Rakesh ; Sarawagi, Sunita</creator><creatorcontrib>Pimplikar, Rakesh ; Sarawagi, Sunita</creatorcontrib><description>We present the design of a structured search engine which returns a multi-column table in response to a query consisting of keywords describing each of its columns. We answer such queries by exploiting the millions of tables on the Web because these are much richer sources of structured knowledge than free-format text. However, a corpus of tables harvested from arbitrary HTML web pages presents huge challenges of diversity and redundancy not seen in centrally edited knowledge bases. We concentrate on one concrete task in this paper. Given a set of Web tables T1, . . ., Tn, and a query Q with q sets of keywords Q1, . . ., Qq, decide for each Ti if it is relevant to Q and if so, identify the mapping between the columns of Ti and query columns. We represent this task as a graphical model that jointly maps all tables by incorporating diverse sources of clues spanning matches in different parts of the table, corpus-wide co-occurrence statistics, and content overlap across table columns. We define a novel query segmentation model for matching keywords to table columns, and a robust mechanism of exploiting content overlap across table columns. We design efficient inference algorithms based on bipartite matching and constrained graph cuts to solve the joint labeling task. Experiments on a workload of 59 queries over a 25 million web table corpus shows significant boost in accuracy over baseline IR methods.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.1207.0132</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Columns (structural) ; Computer Science - Databases ; Graph matching ; Graph theory ; Keywords ; Knowledge base ; Mapping ; Model matching ; Queries ; Redundancy ; Search engines ; Segmentation ; Websites ; Workload</subject><ispartof>arXiv.org, 2012-06</ispartof><rights>2012. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,780,881,27904</link.rule.ids><backlink>$$Uhttps://doi.org/10.14778/2336664.2336665$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.48550/arXiv.1207.0132$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Pimplikar, Rakesh</creatorcontrib><creatorcontrib>Sarawagi, Sunita</creatorcontrib><title>Answering Table Queries on the Web using Column Keywords</title><title>arXiv.org</title><description>We present the design of a structured search engine which returns a multi-column table in response to a query consisting of keywords describing each of its columns. We answer such queries by exploiting the millions of tables on the Web because these are much richer sources of structured knowledge than free-format text. However, a corpus of tables harvested from arbitrary HTML web pages presents huge challenges of diversity and redundancy not seen in centrally edited knowledge bases. We concentrate on one concrete task in this paper. Given a set of Web tables T1, . . ., Tn, and a query Q with q sets of keywords Q1, . . ., Qq, decide for each Ti if it is relevant to Q and if so, identify the mapping between the columns of Ti and query columns. We represent this task as a graphical model that jointly maps all tables by incorporating diverse sources of clues spanning matches in different parts of the table, corpus-wide co-occurrence statistics, and content overlap across table columns. We define a novel query segmentation model for matching keywords to table columns, and a robust mechanism of exploiting content overlap across table columns. We design efficient inference algorithms based on bipartite matching and constrained graph cuts to solve the joint labeling task. Experiments on a workload of 59 queries over a 25 million web table corpus shows significant boost in accuracy over baseline IR methods.</description><subject>Algorithms</subject><subject>Columns (structural)</subject><subject>Computer Science - Databases</subject><subject>Graph matching</subject><subject>Graph theory</subject><subject>Keywords</subject><subject>Knowledge base</subject><subject>Mapping</subject><subject>Model matching</subject><subject>Queries</subject><subject>Redundancy</subject><subject>Search engines</subject><subject>Segmentation</subject><subject>Websites</subject><subject>Workload</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><sourceid>GOX</sourceid><recordid>eNotj99LwzAUhYMgOObefZKAz63JTbLePo7hj-FAhIKPJVlvtKNrZ7I699_bOp8uh_txOB9jN1KkGo0R9zb81N-pBJGlQiq4YBNQSiaoAa7YLMatEALmGRijJgwXbTxSqNsPXljXEH_rh0SRdy0_fBJ_J8f7OL6XXdPvWv5Cp2MXqnjNLr1tIs3-75QVjw_F8jlZvz6tlot1Yo3EhDRWGXmvpasMSgTp8o3WUCGS9VYbk5PKnQNF2viN8uicq5xDozQIm6spuz3X_lmV-1DvbDiVo1052g3A3RnYh-6rp3got10f2mFSOTBzMKAlql8tc1GF</recordid><startdate>20120630</startdate><enddate>20120630</enddate><creator>Pimplikar, Rakesh</creator><creator>Sarawagi, Sunita</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20120630</creationdate><title>Answering Table Queries on the Web using Column Keywords</title><author>Pimplikar, Rakesh ; Sarawagi, Sunita</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a518-e48d7eff41bd581821b9c442d88eafa4559e39bb23e45fc3f8bbbdbb853420a93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Algorithms</topic><topic>Columns (structural)</topic><topic>Computer Science - Databases</topic><topic>Graph matching</topic><topic>Graph theory</topic><topic>Keywords</topic><topic>Knowledge base</topic><topic>Mapping</topic><topic>Model matching</topic><topic>Queries</topic><topic>Redundancy</topic><topic>Search engines</topic><topic>Segmentation</topic><topic>Websites</topic><topic>Workload</topic><toplevel>online_resources</toplevel><creatorcontrib>Pimplikar, Rakesh</creatorcontrib><creatorcontrib>Sarawagi, Sunita</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Pimplikar, Rakesh</au><au>Sarawagi, Sunita</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Answering Table Queries on the Web using Column Keywords</atitle><jtitle>arXiv.org</jtitle><date>2012-06-30</date><risdate>2012</risdate><eissn>2331-8422</eissn><abstract>We present the design of a structured search engine which returns a multi-column table in response to a query consisting of keywords describing each of its columns. We answer such queries by exploiting the millions of tables on the Web because these are much richer sources of structured knowledge than free-format text. However, a corpus of tables harvested from arbitrary HTML web pages presents huge challenges of diversity and redundancy not seen in centrally edited knowledge bases. We concentrate on one concrete task in this paper. Given a set of Web tables T1, . . ., Tn, and a query Q with q sets of keywords Q1, . . ., Qq, decide for each Ti if it is relevant to Q and if so, identify the mapping between the columns of Ti and query columns. We represent this task as a graphical model that jointly maps all tables by incorporating diverse sources of clues spanning matches in different parts of the table, corpus-wide co-occurrence statistics, and content overlap across table columns. We define a novel query segmentation model for matching keywords to table columns, and a robust mechanism of exploiting content overlap across table columns. We design efficient inference algorithms based on bipartite matching and constrained graph cuts to solve the joint labeling task. Experiments on a workload of 59 queries over a 25 million web table corpus shows significant boost in accuracy over baseline IR methods.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.1207.0132</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2012-06
issn	2331-8422
language	eng
recordid	cdi_arxiv_primary_1207_0132
source	arXiv.org; Free E- Journals
subjects	Algorithms Columns (structural) Computer Science - Databases Graph matching Graph theory Keywords Knowledge base Mapping Model matching Queries Redundancy Search engines Segmentation Websites Workload
title	Answering Table Queries on the Web using Column Keywords
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T03%3A51%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Answering%20Table%20Queries%20on%20the%20Web%20using%20Column%20Keywords&rft.jtitle=arXiv.org&rft.au=Pimplikar,%20Rakesh&rft.date=2012-06-30&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.1207.0132&rft_dat=%3Cproquest_arxiv%3E2076252418%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2076252418&rft_id=info:pmid/&rfr_iscdi=true