TRScore: A Novel GPT-based Readability Scorer for ASR Segmentation and Punctuation model evaluation and selection
Punctuation and Segmentation are key to readability in Automatic Speech Recognition (ASR), often evaluated using F1 scores that require high-quality human transcripts and do not reflect readability well. Human evaluation is expensive, time-consuming, and suffers from large inter-observer variability...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Punctuation and Segmentation are key to readability in Automatic Speech
Recognition (ASR), often evaluated using F1 scores that require high-quality
human transcripts and do not reflect readability well. Human evaluation is
expensive, time-consuming, and suffers from large inter-observer variability,
especially in conversational speech devoid of strict grammatical structures.
Large pre-trained models capture a notion of grammatical structure. We present
TRScore, a novel readability measure using the GPT model to evaluate different
segmentation and punctuation systems. We validate our approach with human
experts. Additionally, our approach enables quantitative assessment of text
post-processing techniques such as capitalization, inverse text normalization
(ITN), and disfluency on overall readability, which traditional word error rate
(WER) and slot error rate (SER) metrics fail to capture. TRScore is strongly
correlated to traditional F1 and human readability scores, with Pearson's
correlation coefficients of 0.67 and 0.98, respectively. It also eliminates the
need for human transcriptions for model selection. |
---|---|
DOI: | 10.48550/arxiv.2210.15104 |