An Embarrassingly Simple Approach for LLM with Strong ASR Capacity

In this paper, we focus on solving one of the most important tasks in the field of speech processing, i.e., automatic speech recognition (ASR), with speech foundation encoders and large language models (LLM). Recent works have complex designs such as compressing the output temporally for the speech...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-02
Hauptverfasser:	Ma, Ziyang, Yang, Guanrou, Yang, Yifan, Gao, Zhifu, Wang, Jiaming, Du, Zhihao, Fan, Yu, Chen, Qian, Zheng, Siqi, Zhang, Shiliang, Xie, Chen
Format:	Artikel
Sprache:	eng
Schlagworte:	Alignment Audio data Automatic speech recognition Benchmarks Large language models Speech Speech encoders Speech processing
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Schreiben Sie den ersten Kommentar!