Continuous Punjabi speech recognition model based on Kaldi ASR toolkit

In this paper, continuous Punjabi speech recognition model is presented using Kaldi toolkit. For speech recognition, the extraction of Mel frequency cepstral coefficients (MFCC) features and perceptual linear prediction (PLP) features were extracted from Punjabi continuous speech samples. The perfor...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of speech technology 2018-06, Vol.21 (2), p.211-216
Hauptverfasser: Guglani, Jyoti, Mishra, A. N.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 216
container_issue 2
container_start_page 211
container_title International journal of speech technology
container_volume 21
creator Guglani, Jyoti
Mishra, A. N.
description In this paper, continuous Punjabi speech recognition model is presented using Kaldi toolkit. For speech recognition, the extraction of Mel frequency cepstral coefficients (MFCC) features and perceptual linear prediction (PLP) features were extracted from Punjabi continuous speech samples. The performance of automatic speech recognition (ASR) system for both monophone and triphone model i.e., tri1, tri2 and tri3 model using N-gram language model is reported. The performance of ASR system were computed in terms of word error rate (WER). A significant reduction in WER was observed using the tri phone model over mono phone model ASR .Also the performance of ASR using tri3 model is improved over tri2 model and the performance of tri2 model is improved over tri1 model ASR. Further, it was found that MFCC feature provides higher speech recognition accuracy than PLP features for continuous Punjabi speech.
doi_str_mv 10.1007/s10772-018-9497-6
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2038767307</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2038767307</sourcerecordid><originalsourceid>FETCH-LOGICAL-c316t-a7def04bbd8a52aaa7d5f349cfcaee63dbfbd96c1fcc3717603e6f992caf8aed3</originalsourceid><addsrcrecordid>eNp1kEtLAzEUhYMoWKs_wF3AdTSPmWSyLMWqWFB8rEMmj5o6ndRkZuG_N2UEV67uPZdzzoUPgEuCrwnG4iYTLARFmDRIVlIgfgRmpC6XhhB8XHbWEEQrwk_BWc5bjLEUks7Aahn7IfRjHDN8HvutbgPMe-fMB0zOxE0fhhB7uIvWdbDV2VlY5KPubICL1xc4xNh9huEcnHjdZXfxO-fgfXX7trxH66e7h-VijQwjfEBaWOdx1ba20TXVuujas0oab7RznNnWt1ZyQ7wxTBDBMXPcS0mN9o12ls3B1dS7T_FrdHlQ2zimvrxUFLNGcMGwKC4yuUyKOSfn1T6FnU7fimB1wKUmXKrgUgdcipcMnTK5ePuNS3_N_4d-ABzibrc</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2038767307</pqid></control><display><type>article</type><title>Continuous Punjabi speech recognition model based on Kaldi ASR toolkit</title><source>Springer Nature - Complete Springer Journals</source><creator>Guglani, Jyoti ; Mishra, A. N.</creator><creatorcontrib>Guglani, Jyoti ; Mishra, A. N.</creatorcontrib><description>In this paper, continuous Punjabi speech recognition model is presented using Kaldi toolkit. For speech recognition, the extraction of Mel frequency cepstral coefficients (MFCC) features and perceptual linear prediction (PLP) features were extracted from Punjabi continuous speech samples. The performance of automatic speech recognition (ASR) system for both monophone and triphone model i.e., tri1, tri2 and tri3 model using N-gram language model is reported. The performance of ASR system were computed in terms of word error rate (WER). A significant reduction in WER was observed using the tri phone model over mono phone model ASR .Also the performance of ASR using tri3 model is improved over tri2 model and the performance of tri2 model is improved over tri1 model ASR. Further, it was found that MFCC feature provides higher speech recognition accuracy than PLP features for continuous Punjabi speech.</description><identifier>ISSN: 1381-2416</identifier><identifier>EISSN: 1572-8110</identifier><identifier>DOI: 10.1007/s10772-018-9497-6</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Artificial Intelligence ; Automatic speech recognition ; Continuous speech ; Engineering ; Feature extraction ; Feature recognition ; Linear prediction ; N-Gram language models ; Punjabi language ; Signal,Image and Speech Processing ; Social Sciences ; Speech perception ; Speech recognition ; Voice recognition</subject><ispartof>International journal of speech technology, 2018-06, Vol.21 (2), p.211-216</ispartof><rights>Springer Science+Business Media, LLC, part of Springer Nature 2018</rights><rights>Copyright Springer Science &amp; Business Media 2018</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c316t-a7def04bbd8a52aaa7d5f349cfcaee63dbfbd96c1fcc3717603e6f992caf8aed3</citedby><cites>FETCH-LOGICAL-c316t-a7def04bbd8a52aaa7d5f349cfcaee63dbfbd96c1fcc3717603e6f992caf8aed3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10772-018-9497-6$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10772-018-9497-6$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27903,27904,41467,42536,51297</link.rule.ids></links><search><creatorcontrib>Guglani, Jyoti</creatorcontrib><creatorcontrib>Mishra, A. N.</creatorcontrib><title>Continuous Punjabi speech recognition model based on Kaldi ASR toolkit</title><title>International journal of speech technology</title><addtitle>Int J Speech Technol</addtitle><description>In this paper, continuous Punjabi speech recognition model is presented using Kaldi toolkit. For speech recognition, the extraction of Mel frequency cepstral coefficients (MFCC) features and perceptual linear prediction (PLP) features were extracted from Punjabi continuous speech samples. The performance of automatic speech recognition (ASR) system for both monophone and triphone model i.e., tri1, tri2 and tri3 model using N-gram language model is reported. The performance of ASR system were computed in terms of word error rate (WER). A significant reduction in WER was observed using the tri phone model over mono phone model ASR .Also the performance of ASR using tri3 model is improved over tri2 model and the performance of tri2 model is improved over tri1 model ASR. Further, it was found that MFCC feature provides higher speech recognition accuracy than PLP features for continuous Punjabi speech.</description><subject>Artificial Intelligence</subject><subject>Automatic speech recognition</subject><subject>Continuous speech</subject><subject>Engineering</subject><subject>Feature extraction</subject><subject>Feature recognition</subject><subject>Linear prediction</subject><subject>N-Gram language models</subject><subject>Punjabi language</subject><subject>Signal,Image and Speech Processing</subject><subject>Social Sciences</subject><subject>Speech perception</subject><subject>Speech recognition</subject><subject>Voice recognition</subject><issn>1381-2416</issn><issn>1572-8110</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><recordid>eNp1kEtLAzEUhYMoWKs_wF3AdTSPmWSyLMWqWFB8rEMmj5o6ndRkZuG_N2UEV67uPZdzzoUPgEuCrwnG4iYTLARFmDRIVlIgfgRmpC6XhhB8XHbWEEQrwk_BWc5bjLEUks7Aahn7IfRjHDN8HvutbgPMe-fMB0zOxE0fhhB7uIvWdbDV2VlY5KPubICL1xc4xNh9huEcnHjdZXfxO-fgfXX7trxH66e7h-VijQwjfEBaWOdx1ba20TXVuujas0oab7RznNnWt1ZyQ7wxTBDBMXPcS0mN9o12ls3B1dS7T_FrdHlQ2zimvrxUFLNGcMGwKC4yuUyKOSfn1T6FnU7fimB1wKUmXKrgUgdcipcMnTK5ePuNS3_N_4d-ABzibrc</recordid><startdate>20180601</startdate><enddate>20180601</enddate><creator>Guglani, Jyoti</creator><creator>Mishra, A. N.</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7T9</scope></search><sort><creationdate>20180601</creationdate><title>Continuous Punjabi speech recognition model based on Kaldi ASR toolkit</title><author>Guglani, Jyoti ; Mishra, A. N.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c316t-a7def04bbd8a52aaa7d5f349cfcaee63dbfbd96c1fcc3717603e6f992caf8aed3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Artificial Intelligence</topic><topic>Automatic speech recognition</topic><topic>Continuous speech</topic><topic>Engineering</topic><topic>Feature extraction</topic><topic>Feature recognition</topic><topic>Linear prediction</topic><topic>N-Gram language models</topic><topic>Punjabi language</topic><topic>Signal,Image and Speech Processing</topic><topic>Social Sciences</topic><topic>Speech perception</topic><topic>Speech recognition</topic><topic>Voice recognition</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Guglani, Jyoti</creatorcontrib><creatorcontrib>Mishra, A. N.</creatorcontrib><collection>CrossRef</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><jtitle>International journal of speech technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Guglani, Jyoti</au><au>Mishra, A. N.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Continuous Punjabi speech recognition model based on Kaldi ASR toolkit</atitle><jtitle>International journal of speech technology</jtitle><stitle>Int J Speech Technol</stitle><date>2018-06-01</date><risdate>2018</risdate><volume>21</volume><issue>2</issue><spage>211</spage><epage>216</epage><pages>211-216</pages><issn>1381-2416</issn><eissn>1572-8110</eissn><abstract>In this paper, continuous Punjabi speech recognition model is presented using Kaldi toolkit. For speech recognition, the extraction of Mel frequency cepstral coefficients (MFCC) features and perceptual linear prediction (PLP) features were extracted from Punjabi continuous speech samples. The performance of automatic speech recognition (ASR) system for both monophone and triphone model i.e., tri1, tri2 and tri3 model using N-gram language model is reported. The performance of ASR system were computed in terms of word error rate (WER). A significant reduction in WER was observed using the tri phone model over mono phone model ASR .Also the performance of ASR using tri3 model is improved over tri2 model and the performance of tri2 model is improved over tri1 model ASR. Further, it was found that MFCC feature provides higher speech recognition accuracy than PLP features for continuous Punjabi speech.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10772-018-9497-6</doi><tpages>6</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1381-2416
ispartof International journal of speech technology, 2018-06, Vol.21 (2), p.211-216
issn 1381-2416
1572-8110
language eng
recordid cdi_proquest_journals_2038767307
source Springer Nature - Complete Springer Journals
subjects Artificial Intelligence
Automatic speech recognition
Continuous speech
Engineering
Feature extraction
Feature recognition
Linear prediction
N-Gram language models
Punjabi language
Signal,Image and Speech Processing
Social Sciences
Speech perception
Speech recognition
Voice recognition
title Continuous Punjabi speech recognition model based on Kaldi ASR toolkit
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-25T09%3A34%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Continuous%20Punjabi%20speech%20recognition%20model%20based%20on%20Kaldi%20ASR%20toolkit&rft.jtitle=International%20journal%20of%20speech%20technology&rft.au=Guglani,%20Jyoti&rft.date=2018-06-01&rft.volume=21&rft.issue=2&rft.spage=211&rft.epage=216&rft.pages=211-216&rft.issn=1381-2416&rft.eissn=1572-8110&rft_id=info:doi/10.1007/s10772-018-9497-6&rft_dat=%3Cproquest_cross%3E2038767307%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2038767307&rft_id=info:pmid/&rfr_iscdi=true