ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit
ESPnet-ST-v2 is a revamp of the open-source ESPnet-ST toolkit necessitated by the broadening interests of the spoken language translation community. ESPnet-ST-v2 supports 1) offline speech-to-text translation (ST), 2) simultaneous speech-to-text translation (SST), and 3) offline speech-to-speech tra...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | ESPnet-ST-v2 is a revamp of the open-source ESPnet-ST toolkit necessitated by
the broadening interests of the spoken language translation community.
ESPnet-ST-v2 supports 1) offline speech-to-text translation (ST), 2)
simultaneous speech-to-text translation (SST), and 3) offline speech-to-speech
translation (S2ST) -- each task is supported with a wide variety of approaches,
differentiating ESPnet-ST-v2 from other open source spoken language translation
toolkits. This toolkit offers state-of-the-art architectures such as
transducers, hybrid CTC/attention, multi-decoders with searchable
intermediates, time-synchronous blockwise CTC/attention, Translatotron models,
and direct discrete unit models. In this paper, we describe the overall design,
example models for each task, and performance benchmarking behind ESPnet-ST-v2,
which is publicly available at https://github.com/espnet/espnet. |
---|---|
DOI: | 10.48550/arxiv.2304.04596 |