Style-agnostic evaluation of ASR using multiple reference transcripts
Word error rate (WER) as a metric has a variety of limitations that have plagued the field of speech recognition. Evaluation datasets suffer from varying style, formality, and inherent ambiguity of the transcription task. In this work, we attempt to mitigate some of these differences by performing s...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Word error rate (WER) as a metric has a variety of limitations that have
plagued the field of speech recognition. Evaluation datasets suffer from
varying style, formality, and inherent ambiguity of the transcription task. In
this work, we attempt to mitigate some of these differences by performing
style-agnostic evaluation of ASR systems using multiple references transcribed
under opposing style parameters. As a result, we find that existing WER reports
are likely significantly over-estimating the number of contentful errors made
by state-of-the-art ASR systems. In addition, we have found our multireference
method to be a useful mechanism for comparing the quality of ASR models that
differ in the stylistic makeup of their training data and target task. |
---|---|
DOI: | 10.48550/arxiv.2412.07937 |