Document Summarization for Answering Non-Factoid Queries

We formulate a document summarization method to extract passage-level answers for non-factoid queries, referred to as answer-biased summaries. We propose to use external information from related Community Question Answering (CQA) content to better identify answer bearing sentences. Three optimizatio...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on knowledge and data engineering 2018-01, Vol.30 (1), p.15-28
Hauptverfasser: Yulianti, Evi, Ruey-Cheng Chen, Scholer, Falk, Croft, W. Bruce, Sanderson, Mark
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 28
container_issue 1
container_start_page 15
container_title IEEE transactions on knowledge and data engineering
container_volume 30
creator Yulianti, Evi
Ruey-Cheng Chen
Scholer, Falk
Croft, W. Bruce
Sanderson, Mark
description We formulate a document summarization method to extract passage-level answers for non-factoid queries, referred to as answer-biased summaries. We propose to use external information from related Community Question Answering (CQA) content to better identify answer bearing sentences. Three optimization-based methods are proposed: (i) query-biased, (ii) CQA-answer-biased, and (iii) expanded-query-biased, where expansion terms were derived from related CQA content. A learning-to-rank-based method is also proposed that incorporates a feature extracted from related CQA content. Our results show that even if a CQA answer does not contain a perfect answer to a query, their content can be exploited to improve the extraction of answer-biased summaries from other corpora. The quality of CQA content is found to impact on the accuracy of optimization-based summaries, though medium quality answers enable the system to achieve a comparable (and in some cases superior) accuracy to state-of-the-art techniques. The learning-to-rank-based summaries, on the other hand, are not significantly influenced by CQA quality. We provide a recommendation of the best use of our proposed approaches in regard to the availability of different quality levels of related CQA content. As a further investigation, the reliability of our approaches was tested on another publicly available dataset.
doi_str_mv 10.1109/TKDE.2017.2754373
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2174542174</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8046046</ieee_id><sourcerecordid>2174542174</sourcerecordid><originalsourceid>FETCH-LOGICAL-c336t-86c912871cb4a590d0956d1dc5173d2ec34b19bf35d6892d12a151f4dfb42cae3</originalsourceid><addsrcrecordid>eNo9kNtKxDAQhoMouK4-gHhT8Lo1k0OTXC57UHFRxPU6tEkqXdxmTVpEn96UXYRhZhj-f4b5ELoGXABgdbd5WiwLgkEURHBGBT1BE-Bc5gQUnKYeM8gZZeIcXcS4xRhLIWGC5MKbYee6PnsbdrsqtL9V3_oua3zIZl38dqHtPrJn3-WryvS-tdnrkGYuXqKzpvqM7upYp-h9tdzMH_L1y_3jfLbODaVln8vSKCBSgKlZxRW2WPHSgjUcBLXEGcpqUHVDuS2lIhZIBRwaZpuaEVM5OkW3h7374L8GF3u99UPo0klNQDDOxpxUcFCZ4GMMrtH70KZ3fjRgPQLSIyA9AtJHQMlzc_C0zrl_vcSsTEH_ANLvYIw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2174542174</pqid></control><display><type>article</type><title>Document Summarization for Answering Non-Factoid Queries</title><source>IEEE Electronic Library (IEL)</source><creator>Yulianti, Evi ; Ruey-Cheng Chen ; Scholer, Falk ; Croft, W. Bruce ; Sanderson, Mark</creator><creatorcontrib>Yulianti, Evi ; Ruey-Cheng Chen ; Scholer, Falk ; Croft, W. Bruce ; Sanderson, Mark</creatorcontrib><description>We formulate a document summarization method to extract passage-level answers for non-factoid queries, referred to as answer-biased summaries. We propose to use external information from related Community Question Answering (CQA) content to better identify answer bearing sentences. Three optimization-based methods are proposed: (i) query-biased, (ii) CQA-answer-biased, and (iii) expanded-query-biased, where expansion terms were derived from related CQA content. A learning-to-rank-based method is also proposed that incorporates a feature extracted from related CQA content. Our results show that even if a CQA answer does not contain a perfect answer to a query, their content can be exploited to improve the extraction of answer-biased summaries from other corpora. The quality of CQA content is found to impact on the accuracy of optimization-based summaries, though medium quality answers enable the system to achieve a comparable (and in some cases superior) accuracy to state-of-the-art techniques. The learning-to-rank-based summaries, on the other hand, are not significantly influenced by CQA quality. We provide a recommendation of the best use of our proposed approaches in regard to the availability of different quality levels of related CQA content. As a further investigation, the reliability of our approaches was tested on another publicly available dataset.</description><identifier>ISSN: 1041-4347</identifier><identifier>EISSN: 1558-2191</identifier><identifier>DOI: 10.1109/TKDE.2017.2754373</identifier><identifier>CODEN: ITKEEH</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>answer-biased summaries ; CQA ; Data mining ; Document summarization ; Feature extraction ; Google ; Identification methods ; Knowledge discovery ; learning-to-rank ; non-factoid queries ; Optimization ; Queries ; Search engines ; Sentences ; State of the art ; Summaries ; Web search</subject><ispartof>IEEE transactions on knowledge and data engineering, 2018-01, Vol.30 (1), p.15-28</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c336t-86c912871cb4a590d0956d1dc5173d2ec34b19bf35d6892d12a151f4dfb42cae3</citedby><cites>FETCH-LOGICAL-c336t-86c912871cb4a590d0956d1dc5173d2ec34b19bf35d6892d12a151f4dfb42cae3</cites><orcidid>0000-0001-9094-0810 ; 0000-0003-0487-9609 ; 0000-0003-1951-4696</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8046046$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8046046$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Yulianti, Evi</creatorcontrib><creatorcontrib>Ruey-Cheng Chen</creatorcontrib><creatorcontrib>Scholer, Falk</creatorcontrib><creatorcontrib>Croft, W. Bruce</creatorcontrib><creatorcontrib>Sanderson, Mark</creatorcontrib><title>Document Summarization for Answering Non-Factoid Queries</title><title>IEEE transactions on knowledge and data engineering</title><addtitle>TKDE</addtitle><description>We formulate a document summarization method to extract passage-level answers for non-factoid queries, referred to as answer-biased summaries. We propose to use external information from related Community Question Answering (CQA) content to better identify answer bearing sentences. Three optimization-based methods are proposed: (i) query-biased, (ii) CQA-answer-biased, and (iii) expanded-query-biased, where expansion terms were derived from related CQA content. A learning-to-rank-based method is also proposed that incorporates a feature extracted from related CQA content. Our results show that even if a CQA answer does not contain a perfect answer to a query, their content can be exploited to improve the extraction of answer-biased summaries from other corpora. The quality of CQA content is found to impact on the accuracy of optimization-based summaries, though medium quality answers enable the system to achieve a comparable (and in some cases superior) accuracy to state-of-the-art techniques. The learning-to-rank-based summaries, on the other hand, are not significantly influenced by CQA quality. We provide a recommendation of the best use of our proposed approaches in regard to the availability of different quality levels of related CQA content. As a further investigation, the reliability of our approaches was tested on another publicly available dataset.</description><subject>answer-biased summaries</subject><subject>CQA</subject><subject>Data mining</subject><subject>Document summarization</subject><subject>Feature extraction</subject><subject>Google</subject><subject>Identification methods</subject><subject>Knowledge discovery</subject><subject>learning-to-rank</subject><subject>non-factoid queries</subject><subject>Optimization</subject><subject>Queries</subject><subject>Search engines</subject><subject>Sentences</subject><subject>State of the art</subject><subject>Summaries</subject><subject>Web search</subject><issn>1041-4347</issn><issn>1558-2191</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kNtKxDAQhoMouK4-gHhT8Lo1k0OTXC57UHFRxPU6tEkqXdxmTVpEn96UXYRhZhj-f4b5ELoGXABgdbd5WiwLgkEURHBGBT1BE-Bc5gQUnKYeM8gZZeIcXcS4xRhLIWGC5MKbYee6PnsbdrsqtL9V3_oua3zIZl38dqHtPrJn3-WryvS-tdnrkGYuXqKzpvqM7upYp-h9tdzMH_L1y_3jfLbODaVln8vSKCBSgKlZxRW2WPHSgjUcBLXEGcpqUHVDuS2lIhZIBRwaZpuaEVM5OkW3h7374L8GF3u99UPo0klNQDDOxpxUcFCZ4GMMrtH70KZ3fjRgPQLSIyA9AtJHQMlzc_C0zrl_vcSsTEH_ANLvYIw</recordid><startdate>20180101</startdate><enddate>20180101</enddate><creator>Yulianti, Evi</creator><creator>Ruey-Cheng Chen</creator><creator>Scholer, Falk</creator><creator>Croft, W. Bruce</creator><creator>Sanderson, Mark</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-9094-0810</orcidid><orcidid>https://orcid.org/0000-0003-0487-9609</orcidid><orcidid>https://orcid.org/0000-0003-1951-4696</orcidid></search><sort><creationdate>20180101</creationdate><title>Document Summarization for Answering Non-Factoid Queries</title><author>Yulianti, Evi ; Ruey-Cheng Chen ; Scholer, Falk ; Croft, W. Bruce ; Sanderson, Mark</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c336t-86c912871cb4a590d0956d1dc5173d2ec34b19bf35d6892d12a151f4dfb42cae3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>answer-biased summaries</topic><topic>CQA</topic><topic>Data mining</topic><topic>Document summarization</topic><topic>Feature extraction</topic><topic>Google</topic><topic>Identification methods</topic><topic>Knowledge discovery</topic><topic>learning-to-rank</topic><topic>non-factoid queries</topic><topic>Optimization</topic><topic>Queries</topic><topic>Search engines</topic><topic>Sentences</topic><topic>State of the art</topic><topic>Summaries</topic><topic>Web search</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yulianti, Evi</creatorcontrib><creatorcontrib>Ruey-Cheng Chen</creatorcontrib><creatorcontrib>Scholer, Falk</creatorcontrib><creatorcontrib>Croft, W. Bruce</creatorcontrib><creatorcontrib>Sanderson, Mark</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on knowledge and data engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Yulianti, Evi</au><au>Ruey-Cheng Chen</au><au>Scholer, Falk</au><au>Croft, W. Bruce</au><au>Sanderson, Mark</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Document Summarization for Answering Non-Factoid Queries</atitle><jtitle>IEEE transactions on knowledge and data engineering</jtitle><stitle>TKDE</stitle><date>2018-01-01</date><risdate>2018</risdate><volume>30</volume><issue>1</issue><spage>15</spage><epage>28</epage><pages>15-28</pages><issn>1041-4347</issn><eissn>1558-2191</eissn><coden>ITKEEH</coden><abstract>We formulate a document summarization method to extract passage-level answers for non-factoid queries, referred to as answer-biased summaries. We propose to use external information from related Community Question Answering (CQA) content to better identify answer bearing sentences. Three optimization-based methods are proposed: (i) query-biased, (ii) CQA-answer-biased, and (iii) expanded-query-biased, where expansion terms were derived from related CQA content. A learning-to-rank-based method is also proposed that incorporates a feature extracted from related CQA content. Our results show that even if a CQA answer does not contain a perfect answer to a query, their content can be exploited to improve the extraction of answer-biased summaries from other corpora. The quality of CQA content is found to impact on the accuracy of optimization-based summaries, though medium quality answers enable the system to achieve a comparable (and in some cases superior) accuracy to state-of-the-art techniques. The learning-to-rank-based summaries, on the other hand, are not significantly influenced by CQA quality. We provide a recommendation of the best use of our proposed approaches in regard to the availability of different quality levels of related CQA content. As a further investigation, the reliability of our approaches was tested on another publicly available dataset.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TKDE.2017.2754373</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0001-9094-0810</orcidid><orcidid>https://orcid.org/0000-0003-0487-9609</orcidid><orcidid>https://orcid.org/0000-0003-1951-4696</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1041-4347
ispartof IEEE transactions on knowledge and data engineering, 2018-01, Vol.30 (1), p.15-28
issn 1041-4347
1558-2191
language eng
recordid cdi_proquest_journals_2174542174
source IEEE Electronic Library (IEL)
subjects answer-biased summaries
CQA
Data mining
Document summarization
Feature extraction
Google
Identification methods
Knowledge discovery
learning-to-rank
non-factoid queries
Optimization
Queries
Search engines
Sentences
State of the art
Summaries
Web search
title Document Summarization for Answering Non-Factoid Queries
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T17%3A08%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Document%20Summarization%20for%20Answering%20Non-Factoid%20Queries&rft.jtitle=IEEE%20transactions%20on%20knowledge%20and%20data%20engineering&rft.au=Yulianti,%20Evi&rft.date=2018-01-01&rft.volume=30&rft.issue=1&rft.spage=15&rft.epage=28&rft.pages=15-28&rft.issn=1041-4347&rft.eissn=1558-2191&rft.coden=ITKEEH&rft_id=info:doi/10.1109/TKDE.2017.2754373&rft_dat=%3Cproquest_RIE%3E2174542174%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2174542174&rft_id=info:pmid/&rft_ieee_id=8046046&rfr_iscdi=true