Answering Table Queries on the Web using Column Keywords

We present the design of a structured search engine which returns a multi-column table in response to a query consisting of keywords describing each of its columns. We answer such queries by exploiting the millions of tables on the Web because these are much richer sources of structured knowledge th...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2012-06
Hauptverfasser: Pimplikar, Rakesh, Sarawagi, Sunita
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Pimplikar, Rakesh
Sarawagi, Sunita
description We present the design of a structured search engine which returns a multi-column table in response to a query consisting of keywords describing each of its columns. We answer such queries by exploiting the millions of tables on the Web because these are much richer sources of structured knowledge than free-format text. However, a corpus of tables harvested from arbitrary HTML web pages presents huge challenges of diversity and redundancy not seen in centrally edited knowledge bases. We concentrate on one concrete task in this paper. Given a set of Web tables T1, . . ., Tn, and a query Q with q sets of keywords Q1, . . ., Qq, decide for each Ti if it is relevant to Q and if so, identify the mapping between the columns of Ti and query columns. We represent this task as a graphical model that jointly maps all tables by incorporating diverse sources of clues spanning matches in different parts of the table, corpus-wide co-occurrence statistics, and content overlap across table columns. We define a novel query segmentation model for matching keywords to table columns, and a robust mechanism of exploiting content overlap across table columns. We design efficient inference algorithms based on bipartite matching and constrained graph cuts to solve the joint labeling task. Experiments on a workload of 59 queries over a 25 million web table corpus shows significant boost in accuracy over baseline IR methods.
doi_str_mv 10.48550/arxiv.1207.0132
format Article
fullrecord <record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_1207_0132</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2076252418</sourcerecordid><originalsourceid>FETCH-LOGICAL-a518-e48d7eff41bd581821b9c442d88eafa4559e39bb23e45fc3f8bbbdbb853420a93</originalsourceid><addsrcrecordid>eNotj99LwzAUhYMgOObefZKAz63JTbLePo7hj-FAhIKPJVlvtKNrZ7I699_bOp8uh_txOB9jN1KkGo0R9zb81N-pBJGlQiq4YBNQSiaoAa7YLMatEALmGRijJgwXbTxSqNsPXljXEH_rh0SRdy0_fBJ_J8f7OL6XXdPvWv5Cp2MXqnjNLr1tIs3-75QVjw_F8jlZvz6tlot1Yo3EhDRWGXmvpasMSgTp8o3WUCGS9VYbk5PKnQNF2viN8uicq5xDozQIm6spuz3X_lmV-1DvbDiVo1052g3A3RnYh-6rp3got10f2mFSOTBzMKAlql8tc1GF</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2076252418</pqid></control><display><type>article</type><title>Answering Table Queries on the Web using Column Keywords</title><source>arXiv.org</source><source>Free E- Journals</source><creator>Pimplikar, Rakesh ; Sarawagi, Sunita</creator><creatorcontrib>Pimplikar, Rakesh ; Sarawagi, Sunita</creatorcontrib><description>We present the design of a structured search engine which returns a multi-column table in response to a query consisting of keywords describing each of its columns. We answer such queries by exploiting the millions of tables on the Web because these are much richer sources of structured knowledge than free-format text. However, a corpus of tables harvested from arbitrary HTML web pages presents huge challenges of diversity and redundancy not seen in centrally edited knowledge bases. We concentrate on one concrete task in this paper. Given a set of Web tables T1, . . ., Tn, and a query Q with q sets of keywords Q1, . . ., Qq, decide for each Ti if it is relevant to Q and if so, identify the mapping between the columns of Ti and query columns. We represent this task as a graphical model that jointly maps all tables by incorporating diverse sources of clues spanning matches in different parts of the table, corpus-wide co-occurrence statistics, and content overlap across table columns. We define a novel query segmentation model for matching keywords to table columns, and a robust mechanism of exploiting content overlap across table columns. We design efficient inference algorithms based on bipartite matching and constrained graph cuts to solve the joint labeling task. Experiments on a workload of 59 queries over a 25 million web table corpus shows significant boost in accuracy over baseline IR methods.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.1207.0132</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Columns (structural) ; Computer Science - Databases ; Graph matching ; Graph theory ; Keywords ; Knowledge base ; Mapping ; Model matching ; Queries ; Redundancy ; Search engines ; Segmentation ; Websites ; Workload</subject><ispartof>arXiv.org, 2012-06</ispartof><rights>2012. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,780,881,27904</link.rule.ids><backlink>$$Uhttps://doi.org/10.14778/2336664.2336665$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.48550/arXiv.1207.0132$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Pimplikar, Rakesh</creatorcontrib><creatorcontrib>Sarawagi, Sunita</creatorcontrib><title>Answering Table Queries on the Web using Column Keywords</title><title>arXiv.org</title><description>We present the design of a structured search engine which returns a multi-column table in response to a query consisting of keywords describing each of its columns. We answer such queries by exploiting the millions of tables on the Web because these are much richer sources of structured knowledge than free-format text. However, a corpus of tables harvested from arbitrary HTML web pages presents huge challenges of diversity and redundancy not seen in centrally edited knowledge bases. We concentrate on one concrete task in this paper. Given a set of Web tables T1, . . ., Tn, and a query Q with q sets of keywords Q1, . . ., Qq, decide for each Ti if it is relevant to Q and if so, identify the mapping between the columns of Ti and query columns. We represent this task as a graphical model that jointly maps all tables by incorporating diverse sources of clues spanning matches in different parts of the table, corpus-wide co-occurrence statistics, and content overlap across table columns. We define a novel query segmentation model for matching keywords to table columns, and a robust mechanism of exploiting content overlap across table columns. We design efficient inference algorithms based on bipartite matching and constrained graph cuts to solve the joint labeling task. Experiments on a workload of 59 queries over a 25 million web table corpus shows significant boost in accuracy over baseline IR methods.</description><subject>Algorithms</subject><subject>Columns (structural)</subject><subject>Computer Science - Databases</subject><subject>Graph matching</subject><subject>Graph theory</subject><subject>Keywords</subject><subject>Knowledge base</subject><subject>Mapping</subject><subject>Model matching</subject><subject>Queries</subject><subject>Redundancy</subject><subject>Search engines</subject><subject>Segmentation</subject><subject>Websites</subject><subject>Workload</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><sourceid>GOX</sourceid><recordid>eNotj99LwzAUhYMgOObefZKAz63JTbLePo7hj-FAhIKPJVlvtKNrZ7I699_bOp8uh_txOB9jN1KkGo0R9zb81N-pBJGlQiq4YBNQSiaoAa7YLMatEALmGRijJgwXbTxSqNsPXljXEH_rh0SRdy0_fBJ_J8f7OL6XXdPvWv5Cp2MXqnjNLr1tIs3-75QVjw_F8jlZvz6tlot1Yo3EhDRWGXmvpasMSgTp8o3WUCGS9VYbk5PKnQNF2viN8uicq5xDozQIm6spuz3X_lmV-1DvbDiVo1052g3A3RnYh-6rp3got10f2mFSOTBzMKAlql8tc1GF</recordid><startdate>20120630</startdate><enddate>20120630</enddate><creator>Pimplikar, Rakesh</creator><creator>Sarawagi, Sunita</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20120630</creationdate><title>Answering Table Queries on the Web using Column Keywords</title><author>Pimplikar, Rakesh ; Sarawagi, Sunita</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a518-e48d7eff41bd581821b9c442d88eafa4559e39bb23e45fc3f8bbbdbb853420a93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Algorithms</topic><topic>Columns (structural)</topic><topic>Computer Science - Databases</topic><topic>Graph matching</topic><topic>Graph theory</topic><topic>Keywords</topic><topic>Knowledge base</topic><topic>Mapping</topic><topic>Model matching</topic><topic>Queries</topic><topic>Redundancy</topic><topic>Search engines</topic><topic>Segmentation</topic><topic>Websites</topic><topic>Workload</topic><toplevel>online_resources</toplevel><creatorcontrib>Pimplikar, Rakesh</creatorcontrib><creatorcontrib>Sarawagi, Sunita</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Pimplikar, Rakesh</au><au>Sarawagi, Sunita</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Answering Table Queries on the Web using Column Keywords</atitle><jtitle>arXiv.org</jtitle><date>2012-06-30</date><risdate>2012</risdate><eissn>2331-8422</eissn><abstract>We present the design of a structured search engine which returns a multi-column table in response to a query consisting of keywords describing each of its columns. We answer such queries by exploiting the millions of tables on the Web because these are much richer sources of structured knowledge than free-format text. However, a corpus of tables harvested from arbitrary HTML web pages presents huge challenges of diversity and redundancy not seen in centrally edited knowledge bases. We concentrate on one concrete task in this paper. Given a set of Web tables T1, . . ., Tn, and a query Q with q sets of keywords Q1, . . ., Qq, decide for each Ti if it is relevant to Q and if so, identify the mapping between the columns of Ti and query columns. We represent this task as a graphical model that jointly maps all tables by incorporating diverse sources of clues spanning matches in different parts of the table, corpus-wide co-occurrence statistics, and content overlap across table columns. We define a novel query segmentation model for matching keywords to table columns, and a robust mechanism of exploiting content overlap across table columns. We design efficient inference algorithms based on bipartite matching and constrained graph cuts to solve the joint labeling task. Experiments on a workload of 59 queries over a 25 million web table corpus shows significant boost in accuracy over baseline IR methods.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.1207.0132</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2012-06
issn 2331-8422
language eng
recordid cdi_arxiv_primary_1207_0132
source arXiv.org; Free E- Journals
subjects Algorithms
Columns (structural)
Computer Science - Databases
Graph matching
Graph theory
Keywords
Knowledge base
Mapping
Model matching
Queries
Redundancy
Search engines
Segmentation
Websites
Workload
title Answering Table Queries on the Web using Column Keywords
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T03%3A51%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Answering%20Table%20Queries%20on%20the%20Web%20using%20Column%20Keywords&rft.jtitle=arXiv.org&rft.au=Pimplikar,%20Rakesh&rft.date=2012-06-30&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.1207.0132&rft_dat=%3Cproquest_arxiv%3E2076252418%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2076252418&rft_id=info:pmid/&rfr_iscdi=true