Fluent Translations from Disfluent Speech in End-to-End Speech Translation
Spoken language translation applications for speech suffer due to conversational speech phenomena, particularly the presence of disfluencies. With the rise of end-to-end speech translation models, processing steps such as disfluency removal that were previously an intermediate step between speech re...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Spoken language translation applications for speech suffer due to
conversational speech phenomena, particularly the presence of disfluencies.
With the rise of end-to-end speech translation models, processing steps such as
disfluency removal that were previously an intermediate step between speech
recognition and machine translation need to be incorporated into model
architectures. We use a sequence-to-sequence model to translate from noisy,
disfluent speech to fluent text with disfluencies removed using the recently
collected `copy-edited' references for the Fisher Spanish-English dataset. We
are able to directly generate fluent translations and introduce considerations
about how to evaluate success on this task. This work provides a baseline for a
new task, the translation of conversational speech with joint removal of
disfluencies. |
---|---|
DOI: | 10.48550/arxiv.1906.00556 |