Joint Speech Transcription and Translation: Pseudo-Labeling with Out-of-Distribution Data
Self-training has been shown to be helpful in addressing data scarcity for many domains, including vision, speech, and language. Specifically, self-training, or pseudo-labeling, labels unsupervised data and adds that to the training pool. In this work, we investigate and use pseudo-labeling for a re...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Self-training has been shown to be helpful in addressing data scarcity for
many domains, including vision, speech, and language. Specifically,
self-training, or pseudo-labeling, labels unsupervised data and adds that to
the training pool. In this work, we investigate and use pseudo-labeling for a
recently proposed novel setup: joint transcription and translation of speech,
which suffers from an absence of sufficient data resources. We show that under
such data-deficient circumstances, the unlabeled data can significantly vary in
domain from the supervised data, which results in pseudo-label quality
degradation. We investigate two categories of remedies that require no
additional supervision and target the domain mismatch: pseudo-label filtering
and data augmentation. We show that pseudo-label analysis and processing as
such results in additional gains on top of the vanilla pseudo-labeling setup
resulting in total improvements of up to 0.6% absolute WER and 2.2 BLEU points. |
---|---|
DOI: | 10.48550/arxiv.2212.09982 |