Answering Table Queries on the Web using Column Keywords
We present the design of a structured search engine which returns a multi-column table in response to a query consisting of keywords describing each of its columns. We answer such queries by exploiting the millions of tables on the Web because these are much richer sources of structured knowledge th...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2012-06 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Pimplikar, Rakesh Sarawagi, Sunita |
description | We present the design of a structured search engine which returns a multi-column table in response to a query consisting of keywords describing each of its columns. We answer such queries by exploiting the millions of tables on the Web because these are much richer sources of structured knowledge than free-format text. However, a corpus of tables harvested from arbitrary HTML web pages presents huge challenges of diversity and redundancy not seen in centrally edited knowledge bases. We concentrate on one concrete task in this paper. Given a set of Web tables T1, . . ., Tn, and a query Q with q sets of keywords Q1, . . ., Qq, decide for each Ti if it is relevant to Q and if so, identify the mapping between the columns of Ti and query columns. We represent this task as a graphical model that jointly maps all tables by incorporating diverse sources of clues spanning matches in different parts of the table, corpus-wide co-occurrence statistics, and content overlap across table columns. We define a novel query segmentation model for matching keywords to table columns, and a robust mechanism of exploiting content overlap across table columns. We design efficient inference algorithms based on bipartite matching and constrained graph cuts to solve the joint labeling task. Experiments on a workload of 59 queries over a 25 million web table corpus shows significant boost in accuracy over baseline IR methods. |
doi_str_mv | 10.48550/arxiv.1207.0132 |
format | Article |
fullrecord | <record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_1207_0132</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2076252418</sourcerecordid><originalsourceid>FETCH-LOGICAL-a518-e48d7eff41bd581821b9c442d88eafa4559e39bb23e45fc3f8bbbdbb853420a93</originalsourceid><addsrcrecordid>eNotj99LwzAUhYMgOObefZKAz63JTbLePo7hj-FAhIKPJVlvtKNrZ7I699_bOp8uh_txOB9jN1KkGo0R9zb81N-pBJGlQiq4YBNQSiaoAa7YLMatEALmGRijJgwXbTxSqNsPXljXEH_rh0SRdy0_fBJ_J8f7OL6XXdPvWv5Cp2MXqnjNLr1tIs3-75QVjw_F8jlZvz6tlot1Yo3EhDRWGXmvpasMSgTp8o3WUCGS9VYbk5PKnQNF2viN8uicq5xDozQIm6spuz3X_lmV-1DvbDiVo1052g3A3RnYh-6rp3got10f2mFSOTBzMKAlql8tc1GF</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2076252418</pqid></control><display><type>article</type><title>Answering Table Queries on the Web using Column Keywords</title><source>arXiv.org</source><source>Free E- Journals</source><creator>Pimplikar, Rakesh ; Sarawagi, Sunita</creator><creatorcontrib>Pimplikar, Rakesh ; Sarawagi, Sunita</creatorcontrib><description>We present the design of a structured search engine which returns a multi-column table in response to a query consisting of keywords describing each of its columns. We answer such queries by exploiting the millions of tables on the Web because these are much richer sources of structured knowledge than free-format text. However, a corpus of tables harvested from arbitrary HTML web pages presents huge challenges of diversity and redundancy not seen in centrally edited knowledge bases. We concentrate on one concrete task in this paper. Given a set of Web tables T1, . . ., Tn, and a query Q with q sets of keywords Q1, . . ., Qq, decide for each Ti if it is relevant to Q and if so, identify the mapping between the columns of Ti and query columns. We represent this task as a graphical model that jointly maps all tables by incorporating diverse sources of clues spanning matches in different parts of the table, corpus-wide co-occurrence statistics, and content overlap across table columns. We define a novel query segmentation model for matching keywords to table columns, and a robust mechanism of exploiting content overlap across table columns. We design efficient inference algorithms based on bipartite matching and constrained graph cuts to solve the joint labeling task. Experiments on a workload of 59 queries over a 25 million web table corpus shows significant boost in accuracy over baseline IR methods.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.1207.0132</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Columns (structural) ; Computer Science - Databases ; Graph matching ; Graph theory ; Keywords ; Knowledge base ; Mapping ; Model matching ; Queries ; Redundancy ; Search engines ; Segmentation ; Websites ; Workload</subject><ispartof>arXiv.org, 2012-06</ispartof><rights>2012. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,780,881,27904</link.rule.ids><backlink>$$Uhttps://doi.org/10.14778/2336664.2336665$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.48550/arXiv.1207.0132$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Pimplikar, Rakesh</creatorcontrib><creatorcontrib>Sarawagi, Sunita</creatorcontrib><title>Answering Table Queries on the Web using Column Keywords</title><title>arXiv.org</title><description>We present the design of a structured search engine which returns a multi-column table in response to a query consisting of keywords describing each of its columns. We answer such queries by exploiting the millions of tables on the Web because these are much richer sources of structured knowledge than free-format text. However, a corpus of tables harvested from arbitrary HTML web pages presents huge challenges of diversity and redundancy not seen in centrally edited knowledge bases. We concentrate on one concrete task in this paper. Given a set of Web tables T1, . . ., Tn, and a query Q with q sets of keywords Q1, . . ., Qq, decide for each Ti if it is relevant to Q and if so, identify the mapping between the columns of Ti and query columns. We represent this task as a graphical model that jointly maps all tables by incorporating diverse sources of clues spanning matches in different parts of the table, corpus-wide co-occurrence statistics, and content overlap across table columns. We define a novel query segmentation model for matching keywords to table columns, and a robust mechanism of exploiting content overlap across table columns. We design efficient inference algorithms based on bipartite matching and constrained graph cuts to solve the joint labeling task. Experiments on a workload of 59 queries over a 25 million web table corpus shows significant boost in accuracy over baseline IR methods.</description><subject>Algorithms</subject><subject>Columns (structural)</subject><subject>Computer Science - Databases</subject><subject>Graph matching</subject><subject>Graph theory</subject><subject>Keywords</subject><subject>Knowledge base</subject><subject>Mapping</subject><subject>Model matching</subject><subject>Queries</subject><subject>Redundancy</subject><subject>Search engines</subject><subject>Segmentation</subject><subject>Websites</subject><subject>Workload</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><sourceid>GOX</sourceid><recordid>eNotj99LwzAUhYMgOObefZKAz63JTbLePo7hj-FAhIKPJVlvtKNrZ7I699_bOp8uh_txOB9jN1KkGo0R9zb81N-pBJGlQiq4YBNQSiaoAa7YLMatEALmGRijJgwXbTxSqNsPXljXEH_rh0SRdy0_fBJ_J8f7OL6XXdPvWv5Cp2MXqnjNLr1tIs3-75QVjw_F8jlZvz6tlot1Yo3EhDRWGXmvpasMSgTp8o3WUCGS9VYbk5PKnQNF2viN8uicq5xDozQIm6spuz3X_lmV-1DvbDiVo1052g3A3RnYh-6rp3got10f2mFSOTBzMKAlql8tc1GF</recordid><startdate>20120630</startdate><enddate>20120630</enddate><creator>Pimplikar, Rakesh</creator><creator>Sarawagi, Sunita</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20120630</creationdate><title>Answering Table Queries on the Web using Column Keywords</title><author>Pimplikar, Rakesh ; Sarawagi, Sunita</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a518-e48d7eff41bd581821b9c442d88eafa4559e39bb23e45fc3f8bbbdbb853420a93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Algorithms</topic><topic>Columns (structural)</topic><topic>Computer Science - Databases</topic><topic>Graph matching</topic><topic>Graph theory</topic><topic>Keywords</topic><topic>Knowledge base</topic><topic>Mapping</topic><topic>Model matching</topic><topic>Queries</topic><topic>Redundancy</topic><topic>Search engines</topic><topic>Segmentation</topic><topic>Websites</topic><topic>Workload</topic><toplevel>online_resources</toplevel><creatorcontrib>Pimplikar, Rakesh</creatorcontrib><creatorcontrib>Sarawagi, Sunita</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Pimplikar, Rakesh</au><au>Sarawagi, Sunita</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Answering Table Queries on the Web using Column Keywords</atitle><jtitle>arXiv.org</jtitle><date>2012-06-30</date><risdate>2012</risdate><eissn>2331-8422</eissn><abstract>We present the design of a structured search engine which returns a multi-column table in response to a query consisting of keywords describing each of its columns. We answer such queries by exploiting the millions of tables on the Web because these are much richer sources of structured knowledge than free-format text. However, a corpus of tables harvested from arbitrary HTML web pages presents huge challenges of diversity and redundancy not seen in centrally edited knowledge bases. We concentrate on one concrete task in this paper. Given a set of Web tables T1, . . ., Tn, and a query Q with q sets of keywords Q1, . . ., Qq, decide for each Ti if it is relevant to Q and if so, identify the mapping between the columns of Ti and query columns. We represent this task as a graphical model that jointly maps all tables by incorporating diverse sources of clues spanning matches in different parts of the table, corpus-wide co-occurrence statistics, and content overlap across table columns. We define a novel query segmentation model for matching keywords to table columns, and a robust mechanism of exploiting content overlap across table columns. We design efficient inference algorithms based on bipartite matching and constrained graph cuts to solve the joint labeling task. Experiments on a workload of 59 queries over a 25 million web table corpus shows significant boost in accuracy over baseline IR methods.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.1207.0132</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2012-06 |
issn | 2331-8422 |
language | eng |
recordid | cdi_arxiv_primary_1207_0132 |
source | arXiv.org; Free E- Journals |
subjects | Algorithms Columns (structural) Computer Science - Databases Graph matching Graph theory Keywords Knowledge base Mapping Model matching Queries Redundancy Search engines Segmentation Websites Workload |
title | Answering Table Queries on the Web using Column Keywords |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T03%3A51%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Answering%20Table%20Queries%20on%20the%20Web%20using%20Column%20Keywords&rft.jtitle=arXiv.org&rft.au=Pimplikar,%20Rakesh&rft.date=2012-06-30&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.1207.0132&rft_dat=%3Cproquest_arxiv%3E2076252418%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2076252418&rft_id=info:pmid/&rfr_iscdi=true |