Transformer-Based Multilingual Automatic Speech Recognition (ASR) Model for Dravidian Languages
India has a rich linguistic diversity with over 1600 Indigenous languages, many of which are experiencing a cultural decline due to limited accessibility, awareness, and information. In recent years, various deep learning techniques such as recurrent neural networks (RNN) and hidden Markov models (H...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Buchkapitel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | India has a rich linguistic diversity with over 1600 Indigenous languages, many of which are experiencing a cultural decline due to limited accessibility, awareness, and information. In recent years, various deep learning techniques such as recurrent neural networks (RNN) and hidden Markov models (HMM) have been applied to low‐resource languages for automatic speech recognition (ASR), but their performance is limited by the availability of quality datasets. Moreover, scarcity of high‐quality data is a huge detriment for Indian languages. Transformers, on the other hand, have emerged as a popular and effective deep learning model for ASR due to their pre‐trained parameters and fine‐tuning capabilities. OpenAI's Whisper model is an ASR system trained on a vast amount of multilingual and multitask data collected from the web. Due to its capabilities and functionalities, it is considered the new benchmark for ASR. While the Whisper model does recognize some Indian languages, there is no specific training for Dravidian languages. However, these languages are of particular interest due to their common roots with other Indian languages and their unique challenges in being spoken natively by a low‐level resource population. The aim of this chapter is to develop a multilingual ASR model for Dravidian languages such as Tamil and Telugu by leveraging the Whisper model and incorporating various speech performance metrics, including word error rate (WER). We obtained 61.2% WER for Telugu and 27.2% WER for Tamil using their minimal configuration, which are significantly better than other existing models. |
---|---|
DOI: | 10.1002/9781394214624.ch13 |