Transformer-Based Multilingual Automatic Speech Recognition (ASR) Model for Dravidian Languages

India has a rich linguistic diversity with over 1600 Indigenous languages, many of which are experiencing a cultural decline due to limited accessibility, awareness, and information. In recent years, various deep learning techniques such as recurrent neural networks (RNN) and hidden Markov models (H...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Chowdary, Divi Eswar, Ganesan, Rahul, Dabbara, Harsha, Jyothish Lal, G, Premjith, B
Format:	Buchkapitel
Sprache:	eng
Schlagworte:	automatic speech recognition Dravidian language Fine tuning low‐resource languages multilingual model transfer learning transformers
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	273
container_issue
container_start_page	259
container_title
container_volume
creator	Chowdary, Divi Eswar Ganesan, Rahul Dabbara, Harsha Jyothish Lal, G Premjith, B
description	India has a rich linguistic diversity with over 1600 Indigenous languages, many of which are experiencing a cultural decline due to limited accessibility, awareness, and information. In recent years, various deep learning techniques such as recurrent neural networks (RNN) and hidden Markov models (HMM) have been applied to low‐resource languages for automatic speech recognition (ASR), but their performance is limited by the availability of quality datasets. Moreover, scarcity of high‐quality data is a huge detriment for Indian languages. Transformers, on the other hand, have emerged as a popular and effective deep learning model for ASR due to their pre‐trained parameters and fine‐tuning capabilities. OpenAI's Whisper model is an ASR system trained on a vast amount of multilingual and multitask data collected from the web. Due to its capabilities and functionalities, it is considered the new benchmark for ASR. While the Whisper model does recognize some Indian languages, there is no specific training for Dravidian languages. However, these languages are of particular interest due to their common roots with other Indian languages and their unique challenges in being spoken natively by a low‐level resource population. The aim of this chapter is to develop a multilingual ASR model for Dravidian languages such as Tamil and Telugu by leveraging the Whisper model and incorporating various speech performance metrics, including word error rate (WER). We obtained 61.2% WER for Telugu and 27.2% WER for Tamil using their minimal configuration, which are significantly better than other existing models.
doi_str_mv	10.1002/9781394214624.ch13
format	Book Chapter
fullrecord	<record><control><sourceid>proquest_wiley</sourceid><recordid>TN_cdi_wiley_ebooks_10_1002_9781394214624_ch13_ch13</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>EBC31214541_202_285</sourcerecordid><originalsourceid>FETCH-LOGICAL-b2363-b30667d8f4c89cdac0f564996a7d28c1c9aefb558272480a2a0850c8fa84dcf3</originalsourceid><addsrcrecordid>eNpVkMtOwzAQAI0QCCj9AU4-wqHgd-xjKU-pFVLbu7VxnNYijUucgPh7UooEXHa1K80cBqELSq4pIezGZJpyIxgViolrt6b8AA1_n5SZwz83l5ocozMqWKaUpkqfoGFKISdSEk0MI6fILhuoUxmbjW9Gt5B8gWdd1YYq1KsOKjzu2riBNji82Hrv1njuXVzVoQ2xxpfjxfwKz2LhK9wr8F0D76EIUOMp7PCVT-foqIQq-eHPHqDlw_1y8jSavjw-T8bTUc644qOcE6WyQpfCaeMKcKSUShijICuYdtQZ8GUupWYZE5oAA6IlcboELQpX8gHie-1HqPyn9XmMr8lSYnfV7L9qdlfte_SU3FPbJr51PrV70Pm6baBya9i2vkmW056UglrWu5iW_Autp3DA</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>book_chapter</recordtype><pqid>EBC31214541_202_285</pqid></control><display><type>book_chapter</type><title>Transformer-Based Multilingual Automatic Speech Recognition (ASR) Model for Dravidian Languages</title><source>O'Reilly Online Learning: Academic/Public Library Edition</source><creator>Chowdary, Divi Eswar ; Ganesan, Rahul ; Dabbara, Harsha ; Jyothish Lal, G ; Premjith, B</creator><contributor>Renuka, D. Karthika ; Kumar, L. Ashok ; Chakravarthi, Bharathi Raja ; Mandl, Thomas ; Karthika Renuka, D ; Mandl, Thomas ; Chakravarthi, Bharathi Raja ; Ashok Kumar, L</contributor><creatorcontrib>Chowdary, Divi Eswar ; Ganesan, Rahul ; Dabbara, Harsha ; Jyothish Lal, G ; Premjith, B ; Renuka, D. Karthika ; Kumar, L. Ashok ; Chakravarthi, Bharathi Raja ; Mandl, Thomas ; Karthika Renuka, D ; Mandl, Thomas ; Chakravarthi, Bharathi Raja ; Ashok Kumar, L</creatorcontrib><description>India has a rich linguistic diversity with over 1600 Indigenous languages, many of which are experiencing a cultural decline due to limited accessibility, awareness, and information. In recent years, various deep learning techniques such as recurrent neural networks (RNN) and hidden Markov models (HMM) have been applied to low‐resource languages for automatic speech recognition (ASR), but their performance is limited by the availability of quality datasets. Moreover, scarcity of high‐quality data is a huge detriment for Indian languages. Transformers, on the other hand, have emerged as a popular and effective deep learning model for ASR due to their pre‐trained parameters and fine‐tuning capabilities. OpenAI's Whisper model is an ASR system trained on a vast amount of multilingual and multitask data collected from the web. Due to its capabilities and functionalities, it is considered the new benchmark for ASR. While the Whisper model does recognize some Indian languages, there is no specific training for Dravidian languages. However, these languages are of particular interest due to their common roots with other Indian languages and their unique challenges in being spoken natively by a low‐level resource population. The aim of this chapter is to develop a multilingual ASR model for Dravidian languages such as Tamil and Telugu by leveraging the Whisper model and incorporating various speech performance metrics, including word error rate (WER). We obtained 61.2% WER for Telugu and 27.2% WER for Tamil using their minimal configuration, which are significantly better than other existing models.</description><identifier>ISBN: 9781394213580</identifier><identifier>ISBN: 1394213581</identifier><identifier>EISBN: 9781394214129</identifier><identifier>EISBN: 139421412X</identifier><identifier>EISBN: 9781394214624</identifier><identifier>EISBN: 1394214626</identifier><identifier>DOI: 10.1002/9781394214624.ch13</identifier><identifier>OCLC: 1427668168</identifier><language>eng</language><publisher>United States: John Wiley & Sons, Incorporated</publisher><subject>automatic speech recognition ; Dravidian language ; Fine tuning ; low‐resource languages ; multilingual model ; transfer learning ; transformers</subject><ispartof>Automatic Speech Recognition and Translation for Low Resource Languages, 2024, p.259-273</ispartof><rights>2024 Scrivener Publishing LLC</rights><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Uhttps://ebookcentral.proquest.com/covers/31214541-l.jpg</thumbnail><link.rule.ids>776,777,781,790,27906</link.rule.ids></links><search><contributor>Renuka, D. Karthika</contributor><contributor>Kumar, L. Ashok</contributor><contributor>Chakravarthi, Bharathi Raja</contributor><contributor>Mandl, Thomas</contributor><contributor>Karthika Renuka, D</contributor><contributor>Mandl, Thomas</contributor><contributor>Chakravarthi, Bharathi Raja</contributor><contributor>Ashok Kumar, L</contributor><creatorcontrib>Chowdary, Divi Eswar</creatorcontrib><creatorcontrib>Ganesan, Rahul</creatorcontrib><creatorcontrib>Dabbara, Harsha</creatorcontrib><creatorcontrib>Jyothish Lal, G</creatorcontrib><creatorcontrib>Premjith, B</creatorcontrib><title>Transformer-Based Multilingual Automatic Speech Recognition (ASR) Model for Dravidian Languages</title><title>Automatic Speech Recognition and Translation for Low Resource Languages</title><description>India has a rich linguistic diversity with over 1600 Indigenous languages, many of which are experiencing a cultural decline due to limited accessibility, awareness, and information. In recent years, various deep learning techniques such as recurrent neural networks (RNN) and hidden Markov models (HMM) have been applied to low‐resource languages for automatic speech recognition (ASR), but their performance is limited by the availability of quality datasets. Moreover, scarcity of high‐quality data is a huge detriment for Indian languages. Transformers, on the other hand, have emerged as a popular and effective deep learning model for ASR due to their pre‐trained parameters and fine‐tuning capabilities. OpenAI's Whisper model is an ASR system trained on a vast amount of multilingual and multitask data collected from the web. Due to its capabilities and functionalities, it is considered the new benchmark for ASR. While the Whisper model does recognize some Indian languages, there is no specific training for Dravidian languages. However, these languages are of particular interest due to their common roots with other Indian languages and their unique challenges in being spoken natively by a low‐level resource population. The aim of this chapter is to develop a multilingual ASR model for Dravidian languages such as Tamil and Telugu by leveraging the Whisper model and incorporating various speech performance metrics, including word error rate (WER). We obtained 61.2% WER for Telugu and 27.2% WER for Tamil using their minimal configuration, which are significantly better than other existing models.</description><subject>automatic speech recognition</subject><subject>Dravidian language</subject><subject>Fine tuning</subject><subject>low‐resource languages</subject><subject>multilingual model</subject><subject>transfer learning</subject><subject>transformers</subject><isbn>9781394213580</isbn><isbn>1394213581</isbn><isbn>9781394214129</isbn><isbn>139421412X</isbn><isbn>9781394214624</isbn><isbn>1394214626</isbn><fulltext>true</fulltext><rsrctype>book_chapter</rsrctype><creationdate>2024</creationdate><recordtype>book_chapter</recordtype><recordid>eNpVkMtOwzAQAI0QCCj9AU4-wqHgd-xjKU-pFVLbu7VxnNYijUucgPh7UooEXHa1K80cBqELSq4pIezGZJpyIxgViolrt6b8AA1_n5SZwz83l5ocozMqWKaUpkqfoGFKISdSEk0MI6fILhuoUxmbjW9Gt5B8gWdd1YYq1KsOKjzu2riBNji82Hrv1njuXVzVoQ2xxpfjxfwKz2LhK9wr8F0D76EIUOMp7PCVT-foqIQq-eHPHqDlw_1y8jSavjw-T8bTUc644qOcE6WyQpfCaeMKcKSUShijICuYdtQZ8GUupWYZE5oAA6IlcboELQpX8gHie-1HqPyn9XmMr8lSYnfV7L9qdlfte_SU3FPbJr51PrV70Pm6baBya9i2vkmW056UglrWu5iW_Autp3DA</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Chowdary, Divi Eswar</creator><creator>Ganesan, Rahul</creator><creator>Dabbara, Harsha</creator><creator>Jyothish Lal, G</creator><creator>Premjith, B</creator><general>John Wiley & Sons, Incorporated</general><general>John Wiley & Sons, Inc</general><scope>FFUUA</scope></search><sort><creationdate>2024</creationdate><title>Transformer-Based Multilingual Automatic Speech Recognition (ASR) Model for Dravidian Languages</title><author>Chowdary, Divi Eswar ; Ganesan, Rahul ; Dabbara, Harsha ; Jyothish Lal, G ; Premjith, B</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-b2363-b30667d8f4c89cdac0f564996a7d28c1c9aefb558272480a2a0850c8fa84dcf3</frbrgroupid><rsrctype>book_chapters</rsrctype><prefilter>book_chapters</prefilter><language>eng</language><creationdate>2024</creationdate><topic>automatic speech recognition</topic><topic>Dravidian language</topic><topic>Fine tuning</topic><topic>low‐resource languages</topic><topic>multilingual model</topic><topic>transfer learning</topic><topic>transformers</topic><toplevel>online_resources</toplevel><creatorcontrib>Chowdary, Divi Eswar</creatorcontrib><creatorcontrib>Ganesan, Rahul</creatorcontrib><creatorcontrib>Dabbara, Harsha</creatorcontrib><creatorcontrib>Jyothish Lal, G</creatorcontrib><creatorcontrib>Premjith, B</creatorcontrib><collection>ProQuest Ebook Central - Book Chapters - Demo use only</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chowdary, Divi Eswar</au><au>Ganesan, Rahul</au><au>Dabbara, Harsha</au><au>Jyothish Lal, G</au><au>Premjith, B</au><au>Renuka, D. Karthika</au><au>Kumar, L. Ashok</au><au>Chakravarthi, Bharathi Raja</au><au>Mandl, Thomas</au><au>Karthika Renuka, D</au><au>Mandl, Thomas</au><au>Chakravarthi, Bharathi Raja</au><au>Ashok Kumar, L</au><format>book</format><genre>bookitem</genre><ristype>CHAP</ristype><atitle>Transformer-Based Multilingual Automatic Speech Recognition (ASR) Model for Dravidian Languages</atitle><btitle>Automatic Speech Recognition and Translation for Low Resource Languages</btitle><date>2024</date><risdate>2024</risdate><spage>259</spage><epage>273</epage><pages>259-273</pages><isbn>9781394213580</isbn><isbn>1394213581</isbn><eisbn>9781394214129</eisbn><eisbn>139421412X</eisbn><eisbn>9781394214624</eisbn><eisbn>1394214626</eisbn><abstract>India has a rich linguistic diversity with over 1600 Indigenous languages, many of which are experiencing a cultural decline due to limited accessibility, awareness, and information. In recent years, various deep learning techniques such as recurrent neural networks (RNN) and hidden Markov models (HMM) have been applied to low‐resource languages for automatic speech recognition (ASR), but their performance is limited by the availability of quality datasets. Moreover, scarcity of high‐quality data is a huge detriment for Indian languages. Transformers, on the other hand, have emerged as a popular and effective deep learning model for ASR due to their pre‐trained parameters and fine‐tuning capabilities. OpenAI's Whisper model is an ASR system trained on a vast amount of multilingual and multitask data collected from the web. Due to its capabilities and functionalities, it is considered the new benchmark for ASR. While the Whisper model does recognize some Indian languages, there is no specific training for Dravidian languages. However, these languages are of particular interest due to their common roots with other Indian languages and their unique challenges in being spoken natively by a low‐level resource population. The aim of this chapter is to develop a multilingual ASR model for Dravidian languages such as Tamil and Telugu by leveraging the Whisper model and incorporating various speech performance metrics, including word error rate (WER). We obtained 61.2% WER for Telugu and 27.2% WER for Tamil using their minimal configuration, which are significantly better than other existing models.</abstract><cop>United States</cop><pub>John Wiley & Sons, Incorporated</pub><doi>10.1002/9781394214624.ch13</doi><oclcid>1427668168</oclcid><tpages>15</tpages></addata></record>
fulltext	fulltext
identifier	ISBN: 9781394213580
ispartof	Automatic Speech Recognition and Translation for Low Resource Languages, 2024, p.259-273
issn
language	eng
recordid	cdi_wiley_ebooks_10_1002_9781394214624_ch13_ch13
source	O'Reilly Online Learning: Academic/Public Library Edition
subjects	automatic speech recognition Dravidian language Fine tuning low‐resource languages multilingual model transfer learning transformers
title	Transformer-Based Multilingual Automatic Speech Recognition (ASR) Model for Dravidian Languages
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T01%3A09%3A55IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_wiley&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=bookitem&rft.atitle=Transformer-Based%20Multilingual%20Automatic%20Speech%20Recognition%20(ASR)%20Model%20for%20Dravidian%20Languages&rft.btitle=Automatic%20Speech%20Recognition%20and%20Translation%20for%20Low%20Resource%20Languages&rft.au=Chowdary,%20Divi%20Eswar&rft.date=2024&rft.spage=259&rft.epage=273&rft.pages=259-273&rft.isbn=9781394213580&rft.isbn_list=1394213581&rft_id=info:doi/10.1002/9781394214624.ch13&rft_dat=%3Cproquest_wiley%3EEBC31214541_202_285%3C/proquest_wiley%3E%3Curl%3E%3C/url%3E&rft.eisbn=9781394214129&rft.eisbn_list=139421412X&rft.eisbn_list=9781394214624&rft.eisbn_list=1394214626&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=EBC31214541_202_285&rft_id=info:pmid/&rfr_iscdi=true