Amphion: An Open-Source Audio, Music and Speech Generation Toolkit

Amphion is an open-source toolkit for Audio, Music, and Speech Generation, targeting to ease the way for junior researchers and engineers into these fields. It presents a unified framework that includes diverse generation tasks and models, with the added bonus of being easily extendable for new inco...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-09
Hauptverfasser:	Zhang, Xueyao, Xue, Liumeng, Gu, Yicheng, Wang, Yuancheng, Li, Jiaqi, He, Haorui, Wang, Chaoren, Liu, Songting, Chen, Xi, Zhang, Junan, Fang, Zihao, Chen, Haopeng, Tang, Tze Ying, Zou, Lexiao, Wang, Mingxuan, Han, Jun, Chen, Kai, Li, Haizhou, Wu, Zhizheng
Format:	Artikel
Sprache:	eng
Schlagworte:	Audio signals Engineers R&D Research & development Signal quality Speech recognition Toolkits Vocoders
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Amphion is an open-source toolkit for Audio, Music, and Speech Generation, targeting to ease the way for junior researchers and engineers into these fields. It presents a unified framework that includes diverse generation tasks and models, with the added bonus of being easily extendable for new incorporation. The toolkit is designed with beginner-friendly workflows and pre-trained models, allowing both beginners and seasoned researchers to kick-start their projects with relative ease. The initial release of Amphion v0.1 supports a range of tasks including Text to Speech (TTS), Text to Audio (TTA), and Singing Voice Conversion (SVC), supplemented by essential components like data preprocessing, state-of-the-art vocoders, and evaluation metrics. This paper presents a high-level overview of Amphion. Amphion is open-sourced at https://github.com/open-mmlab/Amphion.
ISSN:	2331-8422