Crowdsourcing Lightweight Pyramids for Manual Summary Evaluation
Conducting a manual evaluation is considered an essential part of summary evaluation methodology. Traditionally, the Pyramid protocol, which exhaustively compares system summaries to references, has been perceived as very reliable, providing objective scores. Yet, due to the high cost of the Pyramid...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Conducting a manual evaluation is considered an essential part of summary
evaluation methodology. Traditionally, the Pyramid protocol, which exhaustively
compares system summaries to references, has been perceived as very reliable,
providing objective scores. Yet, due to the high cost of the Pyramid method and
the required expertise, researchers resorted to cheaper and less thorough
manual evaluation methods, such as Responsiveness and pairwise comparison,
attainable via crowdsourcing. We revisit the Pyramid approach, proposing a
lightweight sampling-based version that is crowdsourcable. We analyze the
performance of our method in comparison to original expert-based Pyramid
evaluations, showing higher correlation relative to the common Responsiveness
method. We release our crowdsourced Summary-Content-Units, along with all
crowdsourcing scripts, for future evaluations. |
---|---|
DOI: | 10.48550/arxiv.1904.05929 |