Speech vs. Transcript: Does It Matter for Human Annotators in Speech Summarization?
Reference summaries for abstractive speech summarization require human annotation, which can be performed by listening to an audio recording or by reading textual transcripts of the recording. In this paper, we examine whether summaries based on annotators listening to the recordings differ from tho...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Reference summaries for abstractive speech summarization require human
annotation, which can be performed by listening to an audio recording or by
reading textual transcripts of the recording. In this paper, we examine whether
summaries based on annotators listening to the recordings differ from those
based on annotators reading transcripts. Using existing intrinsic evaluation
based on human evaluation, automatic metrics, LLM-based evaluation, and a
retrieval-based reference-free method. We find that summaries are indeed
different based on the source modality, and that speech-based summaries are
more factually consistent and information-selective than transcript-based
summaries. Meanwhile, transcript-based summaries are impacted by recognition
errors in the source, and expert-written summaries are more informative and
reliable. We make all the collected data and analysis code
public(https://github.com/cmu-mlsp/interview_humanssum) to facilitate the
reproduction of our work and advance research in this area. |
---|---|
DOI: | 10.48550/arxiv.2408.07277 |