Improving End-to-End Speech Translation by Imitation-Based Knowledge Distillation with Synthetic Transcripts
In Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023), pages 89-101 End-to-end automatic speech translation (AST) relies on data that combines audio inputs with text translation outputs. Previous work used existing large parallel corpora of transcriptions an...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In Proceedings of the 20th International Conference on Spoken
Language Translation (IWSLT 2023), pages 89-101 End-to-end automatic speech translation (AST) relies on data that combines
audio inputs with text translation outputs. Previous work used existing large
parallel corpora of transcriptions and translations in a knowledge distillation
(KD) setup to distill a neural machine translation (NMT) into an AST student
model. While KD allows using larger pretrained models, the reliance of previous
KD approaches on manual audio transcripts in the data pipeline restricts the
applicability of this framework to AST. We present an imitation learning
approach where a teacher NMT system corrects the errors of an AST student
without relying on manual transcripts. We show that the NMT teacher can recover
from errors in automatic transcriptions and is able to correct erroneous
translations of the AST student, leading to improvements of about 4 BLEU points
over the standard AST end-to-end baseline on the English-German CoVoST-2 and
MuST-C datasets, respectively. Code and data are publicly
available.\footnote{\url{https://github.com/HubReb/imitkd_ast/releases/tag/v1.1}} |
---|---|
DOI: | 10.48550/arxiv.2307.08426 |