WikiAsp: A Dataset for Multi-domain Aspect-based Summarization

Aspect-based summarization is the task of generating focused summaries based on specific points of interest. Such summaries aid efficient analysis of text, such as quickly understanding reviews or opinions from different angles. However, due to large differences in the type of aspects for different...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Hayashi, Hiroaki, Budania, Prashant, Wang, Peng, Ackerson, Chris, Neervannan, Raj, Neubig, Graham
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Hayashi, Hiroaki
Budania, Prashant
Wang, Peng
Ackerson, Chris
Neervannan, Raj
Neubig, Graham
description Aspect-based summarization is the task of generating focused summaries based on specific points of interest. Such summaries aid efficient analysis of text, such as quickly understanding reviews or opinions from different angles. However, due to large differences in the type of aspects for different domains (e.g., sentiment, product features), the development of previous models has tended to be domain-specific. In this paper, we propose WikiAsp, a large-scale dataset for multi-domain aspect-based summarization that attempts to spur research in the direction of open-domain aspect-based summarization. Specifically, we build the dataset using Wikipedia articles from 20 different domains, using the section titles and boundaries of each article as a proxy for aspect annotation. We propose several straightforward baseline models for this task and conduct experiments on the dataset. Results highlight key challenges that existing summarization models face in this setting, such as proper pronoun handling of quoted sources and consistent explanation of time-sensitive events.
doi_str_mv 10.48550/arxiv.2011.07832
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2011_07832</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2011_07832</sourcerecordid><originalsourceid>FETCH-LOGICAL-a672-af02742b23d3a4a82388530994e9f9a79ba831775bafbccf5f771848e07c3f333</originalsourceid><addsrcrecordid>eNotj71OwzAURr0woMIDMOEXcLB9ba7NgBSVv0pFDFRijK4TW7JomipxEfD0hML0DUf6dA5jF0pWxlkrr2j8zB-VlkpVEh3oU3b7lt9zPe1veM3vqNAUC0_DyJ8P25JFN_SUd3zmsS0izLTjr4e-pzF_U8nD7oydJNpO8fx_F2zzcL9ZPon1y-NqWa8FXaMWlKRGo4OGDsiQ0-CcBem9iT55Qh_IgUK0gVJo22QTonLGRYktJABYsMu_22NAsx_zrPDV_IY0xxD4AdnmQhg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>WikiAsp: A Dataset for Multi-domain Aspect-based Summarization</title><source>arXiv.org</source><creator>Hayashi, Hiroaki ; Budania, Prashant ; Wang, Peng ; Ackerson, Chris ; Neervannan, Raj ; Neubig, Graham</creator><creatorcontrib>Hayashi, Hiroaki ; Budania, Prashant ; Wang, Peng ; Ackerson, Chris ; Neervannan, Raj ; Neubig, Graham</creatorcontrib><description>Aspect-based summarization is the task of generating focused summaries based on specific points of interest. Such summaries aid efficient analysis of text, such as quickly understanding reviews or opinions from different angles. However, due to large differences in the type of aspects for different domains (e.g., sentiment, product features), the development of previous models has tended to be domain-specific. In this paper, we propose WikiAsp, a large-scale dataset for multi-domain aspect-based summarization that attempts to spur research in the direction of open-domain aspect-based summarization. Specifically, we build the dataset using Wikipedia articles from 20 different domains, using the section titles and boundaries of each article as a proxy for aspect annotation. We propose several straightforward baseline models for this task and conduct experiments on the dataset. Results highlight key challenges that existing summarization models face in this setting, such as proper pronoun handling of quoted sources and consistent explanation of time-sensitive events.</description><identifier>DOI: 10.48550/arxiv.2011.07832</identifier><language>eng</language><subject>Computer Science - Computation and Language</subject><creationdate>2020-11</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2011.07832$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2011.07832$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Hayashi, Hiroaki</creatorcontrib><creatorcontrib>Budania, Prashant</creatorcontrib><creatorcontrib>Wang, Peng</creatorcontrib><creatorcontrib>Ackerson, Chris</creatorcontrib><creatorcontrib>Neervannan, Raj</creatorcontrib><creatorcontrib>Neubig, Graham</creatorcontrib><title>WikiAsp: A Dataset for Multi-domain Aspect-based Summarization</title><description>Aspect-based summarization is the task of generating focused summaries based on specific points of interest. Such summaries aid efficient analysis of text, such as quickly understanding reviews or opinions from different angles. However, due to large differences in the type of aspects for different domains (e.g., sentiment, product features), the development of previous models has tended to be domain-specific. In this paper, we propose WikiAsp, a large-scale dataset for multi-domain aspect-based summarization that attempts to spur research in the direction of open-domain aspect-based summarization. Specifically, we build the dataset using Wikipedia articles from 20 different domains, using the section titles and boundaries of each article as a proxy for aspect annotation. We propose several straightforward baseline models for this task and conduct experiments on the dataset. Results highlight key challenges that existing summarization models face in this setting, such as proper pronoun handling of quoted sources and consistent explanation of time-sensitive events.</description><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj71OwzAURr0woMIDMOEXcLB9ba7NgBSVv0pFDFRijK4TW7JomipxEfD0hML0DUf6dA5jF0pWxlkrr2j8zB-VlkpVEh3oU3b7lt9zPe1veM3vqNAUC0_DyJ8P25JFN_SUd3zmsS0izLTjr4e-pzF_U8nD7oydJNpO8fx_F2zzcL9ZPon1y-NqWa8FXaMWlKRGo4OGDsiQ0-CcBem9iT55Qh_IgUK0gVJo22QTonLGRYktJABYsMu_22NAsx_zrPDV_IY0xxD4AdnmQhg</recordid><startdate>20201116</startdate><enddate>20201116</enddate><creator>Hayashi, Hiroaki</creator><creator>Budania, Prashant</creator><creator>Wang, Peng</creator><creator>Ackerson, Chris</creator><creator>Neervannan, Raj</creator><creator>Neubig, Graham</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20201116</creationdate><title>WikiAsp: A Dataset for Multi-domain Aspect-based Summarization</title><author>Hayashi, Hiroaki ; Budania, Prashant ; Wang, Peng ; Ackerson, Chris ; Neervannan, Raj ; Neubig, Graham</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a672-af02742b23d3a4a82388530994e9f9a79ba831775bafbccf5f771848e07c3f333</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Hayashi, Hiroaki</creatorcontrib><creatorcontrib>Budania, Prashant</creatorcontrib><creatorcontrib>Wang, Peng</creatorcontrib><creatorcontrib>Ackerson, Chris</creatorcontrib><creatorcontrib>Neervannan, Raj</creatorcontrib><creatorcontrib>Neubig, Graham</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Hayashi, Hiroaki</au><au>Budania, Prashant</au><au>Wang, Peng</au><au>Ackerson, Chris</au><au>Neervannan, Raj</au><au>Neubig, Graham</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>WikiAsp: A Dataset for Multi-domain Aspect-based Summarization</atitle><date>2020-11-16</date><risdate>2020</risdate><abstract>Aspect-based summarization is the task of generating focused summaries based on specific points of interest. Such summaries aid efficient analysis of text, such as quickly understanding reviews or opinions from different angles. However, due to large differences in the type of aspects for different domains (e.g., sentiment, product features), the development of previous models has tended to be domain-specific. In this paper, we propose WikiAsp, a large-scale dataset for multi-domain aspect-based summarization that attempts to spur research in the direction of open-domain aspect-based summarization. Specifically, we build the dataset using Wikipedia articles from 20 different domains, using the section titles and boundaries of each article as a proxy for aspect annotation. We propose several straightforward baseline models for this task and conduct experiments on the dataset. Results highlight key challenges that existing summarization models face in this setting, such as proper pronoun handling of quoted sources and consistent explanation of time-sensitive events.</abstract><doi>10.48550/arxiv.2011.07832</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2011.07832
ispartof
issn
language eng
recordid cdi_arxiv_primary_2011_07832
source arXiv.org
subjects Computer Science - Computation and Language
title WikiAsp: A Dataset for Multi-domain Aspect-based Summarization
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T20%3A27%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=WikiAsp:%20A%20Dataset%20for%20Multi-domain%20Aspect-based%20Summarization&rft.au=Hayashi,%20Hiroaki&rft.date=2020-11-16&rft_id=info:doi/10.48550/arxiv.2011.07832&rft_dat=%3Carxiv_GOX%3E2011_07832%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true