WikiAsp: A Dataset for Multi-domain Aspect-based Summarization
Aspect-based summarization is the task of generating focused summaries based on specific points of interest. Such summaries aid efficient analysis of text, such as quickly understanding reviews or opinions from different angles. However, due to large differences in the type of aspects for different...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Aspect-based summarization is the task of generating focused summaries based
on specific points of interest. Such summaries aid efficient analysis of text,
such as quickly understanding reviews or opinions from different angles.
However, due to large differences in the type of aspects for different domains
(e.g., sentiment, product features), the development of previous models has
tended to be domain-specific. In this paper, we propose WikiAsp, a large-scale
dataset for multi-domain aspect-based summarization that attempts to spur
research in the direction of open-domain aspect-based summarization.
Specifically, we build the dataset using Wikipedia articles from 20 different
domains, using the section titles and boundaries of each article as a proxy for
aspect annotation. We propose several straightforward baseline models for this
task and conduct experiments on the dataset. Results highlight key challenges
that existing summarization models face in this setting, such as proper pronoun
handling of quoted sources and consistent explanation of time-sensitive events. |
---|---|
DOI: | 10.48550/arxiv.2011.07832 |