Automatic Speech Recognition of Vietnamese for a New Large-Scale Corpus

Vietnamese is an under-resourced language. The requirement for a large-scale and high-quality Vietnamese speech corpus increases on demand. We introduce a new large-scale Vietnamese speech corpus with 100.5 h collected from various audio sources in the Internet. The raw collected audio was processed...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Electronics (Basel) 2024-03, Vol.13 (5), p.977
Hauptverfasser:	Tran, Linh Thi Thuc, Kim, Han-Gyu, La, Hoang Minh, Van Pham, Su
Format:	Artikel
Sprache:	eng
Schlagworte:	Analysis Automatic speech recognition Computational linguistics Corpus analysis Corpus linguistics Datasets Dialects Females Internet Language Language processing Natural language interfaces Neural networks Open access Reading Regional dialects Regions Simulation Speaking Speech Speech recognition Tonal languages Topics Transcription Vietnamese Voice recognition
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Vietnamese is an under-resourced language. The requirement for a large-scale and high-quality Vietnamese speech corpus increases on demand. We introduce a new large-scale Vietnamese speech corpus with 100.5 h collected from various audio sources in the Internet. The raw collected audio was processed to obtain clean speech. Transcription of the clean speech was made manually. The new corpus was analyzed in terms of gender, topic and regional dialect. Results shows that the new corpus has good diversity of genders, topics and regional dialects. We also evaluated the new corpus using state-of-the-art automatic speech recognition models like LAS and Speech-Transformer for multiple scenarios. This is the first time that these models have been applied to Vietnamese speech recognition and obtained reasonable results. Simulation results showed that the new corpus would be a good dataset for the Vietnamese ASR tasks because it reflected correctly difficulties in recognizing speech from different dialects and topic domains.
ISSN:	2079-9292 2079-9292
DOI:	10.3390/electronics13050977