Towards Question-Answering as an Automatic Metric for Evaluating the Content Quality of a Summary
A desirable property of a reference-based evaluation metric that measures the content quality of a summary is that it should estimate how much information that summary has in common with a reference. Traditional text overlap based metrics such as ROUGE fail to achieve this because they are limited t...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A desirable property of a reference-based evaluation metric that measures the
content quality of a summary is that it should estimate how much information
that summary has in common with a reference. Traditional text overlap based
metrics such as ROUGE fail to achieve this because they are limited to matching
tokens, either lexically or via embeddings. In this work, we propose a metric
to evaluate the content quality of a summary using question-answering (QA).
QA-based methods directly measure a summary's information overlap with a
reference, making them fundamentally different than text overlap metrics. We
demonstrate the experimental benefits of QA-based metrics through an analysis
of our proposed metric, QAEval. QAEval out-performs current state-of-the-art
metrics on most evaluations using benchmark datasets, while being competitive
on others due to limitations of state-of-the-art models. Through a careful
analysis of each component of QAEval, we identify its performance bottlenecks
and estimate that its potential upper-bound performance surpasses all other
automatic metrics, approaching that of the gold-standard Pyramid Method. |
---|---|
DOI: | 10.48550/arxiv.2010.00490 |