SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics
While subjective assessments have been the gold standard for evaluating speech generation, there is a growing need for objective metrics that are highly correlated with human subjective judgments due to their cost efficiency. This paper proposes reference-aware automatic evaluation methods for speec...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | While subjective assessments have been the gold standard for evaluating
speech generation, there is a growing need for objective metrics that are
highly correlated with human subjective judgments due to their cost efficiency.
This paper proposes reference-aware automatic evaluation methods for speech
generation inspired by evaluation metrics in natural language processing. The
proposed SpeechBERTScore computes the BERTScore for self-supervised dense
speech features of the generated and reference speech, which can have different
sequential lengths. We also propose SpeechBLEU and SpeechTokenDistance, which
are computed on speech discrete tokens. The evaluations on synthesized speech
show that our method correlates better with human subjective ratings than mel
cepstral distortion and a recent mean opinion score prediction model. Also,
they are effective in noisy speech evaluation and have cross-lingual
applicability. |
---|---|
DOI: | 10.48550/arxiv.2401.16812 |