Continuous Punjabi speech recognition model based on Kaldi ASR toolkit
In this paper, continuous Punjabi speech recognition model is presented using Kaldi toolkit. For speech recognition, the extraction of Mel frequency cepstral coefficients (MFCC) features and perceptual linear prediction (PLP) features were extracted from Punjabi continuous speech samples. The perfor...
Gespeichert in:
Veröffentlicht in: | International journal of speech technology 2018-06, Vol.21 (2), p.211-216 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 216 |
---|---|
container_issue | 2 |
container_start_page | 211 |
container_title | International journal of speech technology |
container_volume | 21 |
creator | Guglani, Jyoti Mishra, A. N. |
description | In this paper, continuous Punjabi speech recognition model is presented using Kaldi toolkit. For speech recognition, the extraction of Mel frequency cepstral coefficients (MFCC) features and perceptual linear prediction (PLP) features were extracted from Punjabi continuous speech samples. The performance of automatic speech recognition (ASR) system for both monophone and triphone model i.e., tri1, tri2 and tri3 model using N-gram language model is reported. The performance of ASR system were computed in terms of word error rate (WER). A significant reduction in WER was observed using the tri phone model over mono phone model ASR .Also the performance of ASR using tri3 model is improved over tri2 model and the performance of tri2 model is improved over tri1 model ASR. Further, it was found that MFCC feature provides higher speech recognition accuracy than PLP features for continuous Punjabi speech. |
doi_str_mv | 10.1007/s10772-018-9497-6 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2038767307</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2038767307</sourcerecordid><originalsourceid>FETCH-LOGICAL-c316t-a7def04bbd8a52aaa7d5f349cfcaee63dbfbd96c1fcc3717603e6f992caf8aed3</originalsourceid><addsrcrecordid>eNp1kEtLAzEUhYMoWKs_wF3AdTSPmWSyLMWqWFB8rEMmj5o6ndRkZuG_N2UEV67uPZdzzoUPgEuCrwnG4iYTLARFmDRIVlIgfgRmpC6XhhB8XHbWEEQrwk_BWc5bjLEUks7Aahn7IfRjHDN8HvutbgPMe-fMB0zOxE0fhhB7uIvWdbDV2VlY5KPubICL1xc4xNh9huEcnHjdZXfxO-fgfXX7trxH66e7h-VijQwjfEBaWOdx1ba20TXVuujas0oab7RznNnWt1ZyQ7wxTBDBMXPcS0mN9o12ls3B1dS7T_FrdHlQ2zimvrxUFLNGcMGwKC4yuUyKOSfn1T6FnU7fimB1wKUmXKrgUgdcipcMnTK5ePuNS3_N_4d-ABzibrc</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2038767307</pqid></control><display><type>article</type><title>Continuous Punjabi speech recognition model based on Kaldi ASR toolkit</title><source>Springer Nature - Complete Springer Journals</source><creator>Guglani, Jyoti ; Mishra, A. N.</creator><creatorcontrib>Guglani, Jyoti ; Mishra, A. N.</creatorcontrib><description>In this paper, continuous Punjabi speech recognition model is presented using Kaldi toolkit. For speech recognition, the extraction of Mel frequency cepstral coefficients (MFCC) features and perceptual linear prediction (PLP) features were extracted from Punjabi continuous speech samples. The performance of automatic speech recognition (ASR) system for both monophone and triphone model i.e., tri1, tri2 and tri3 model using N-gram language model is reported. The performance of ASR system were computed in terms of word error rate (WER). A significant reduction in WER was observed using the tri phone model over mono phone model ASR .Also the performance of ASR using tri3 model is improved over tri2 model and the performance of tri2 model is improved over tri1 model ASR. Further, it was found that MFCC feature provides higher speech recognition accuracy than PLP features for continuous Punjabi speech.</description><identifier>ISSN: 1381-2416</identifier><identifier>EISSN: 1572-8110</identifier><identifier>DOI: 10.1007/s10772-018-9497-6</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Artificial Intelligence ; Automatic speech recognition ; Continuous speech ; Engineering ; Feature extraction ; Feature recognition ; Linear prediction ; N-Gram language models ; Punjabi language ; Signal,Image and Speech Processing ; Social Sciences ; Speech perception ; Speech recognition ; Voice recognition</subject><ispartof>International journal of speech technology, 2018-06, Vol.21 (2), p.211-216</ispartof><rights>Springer Science+Business Media, LLC, part of Springer Nature 2018</rights><rights>Copyright Springer Science & Business Media 2018</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c316t-a7def04bbd8a52aaa7d5f349cfcaee63dbfbd96c1fcc3717603e6f992caf8aed3</citedby><cites>FETCH-LOGICAL-c316t-a7def04bbd8a52aaa7d5f349cfcaee63dbfbd96c1fcc3717603e6f992caf8aed3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10772-018-9497-6$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10772-018-9497-6$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27903,27904,41467,42536,51297</link.rule.ids></links><search><creatorcontrib>Guglani, Jyoti</creatorcontrib><creatorcontrib>Mishra, A. N.</creatorcontrib><title>Continuous Punjabi speech recognition model based on Kaldi ASR toolkit</title><title>International journal of speech technology</title><addtitle>Int J Speech Technol</addtitle><description>In this paper, continuous Punjabi speech recognition model is presented using Kaldi toolkit. For speech recognition, the extraction of Mel frequency cepstral coefficients (MFCC) features and perceptual linear prediction (PLP) features were extracted from Punjabi continuous speech samples. The performance of automatic speech recognition (ASR) system for both monophone and triphone model i.e., tri1, tri2 and tri3 model using N-gram language model is reported. The performance of ASR system were computed in terms of word error rate (WER). A significant reduction in WER was observed using the tri phone model over mono phone model ASR .Also the performance of ASR using tri3 model is improved over tri2 model and the performance of tri2 model is improved over tri1 model ASR. Further, it was found that MFCC feature provides higher speech recognition accuracy than PLP features for continuous Punjabi speech.</description><subject>Artificial Intelligence</subject><subject>Automatic speech recognition</subject><subject>Continuous speech</subject><subject>Engineering</subject><subject>Feature extraction</subject><subject>Feature recognition</subject><subject>Linear prediction</subject><subject>N-Gram language models</subject><subject>Punjabi language</subject><subject>Signal,Image and Speech Processing</subject><subject>Social Sciences</subject><subject>Speech perception</subject><subject>Speech recognition</subject><subject>Voice recognition</subject><issn>1381-2416</issn><issn>1572-8110</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><recordid>eNp1kEtLAzEUhYMoWKs_wF3AdTSPmWSyLMWqWFB8rEMmj5o6ndRkZuG_N2UEV67uPZdzzoUPgEuCrwnG4iYTLARFmDRIVlIgfgRmpC6XhhB8XHbWEEQrwk_BWc5bjLEUks7Aahn7IfRjHDN8HvutbgPMe-fMB0zOxE0fhhB7uIvWdbDV2VlY5KPubICL1xc4xNh9huEcnHjdZXfxO-fgfXX7trxH66e7h-VijQwjfEBaWOdx1ba20TXVuujas0oab7RznNnWt1ZyQ7wxTBDBMXPcS0mN9o12ls3B1dS7T_FrdHlQ2zimvrxUFLNGcMGwKC4yuUyKOSfn1T6FnU7fimB1wKUmXKrgUgdcipcMnTK5ePuNS3_N_4d-ABzibrc</recordid><startdate>20180601</startdate><enddate>20180601</enddate><creator>Guglani, Jyoti</creator><creator>Mishra, A. N.</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7T9</scope></search><sort><creationdate>20180601</creationdate><title>Continuous Punjabi speech recognition model based on Kaldi ASR toolkit</title><author>Guglani, Jyoti ; Mishra, A. N.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c316t-a7def04bbd8a52aaa7d5f349cfcaee63dbfbd96c1fcc3717603e6f992caf8aed3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Artificial Intelligence</topic><topic>Automatic speech recognition</topic><topic>Continuous speech</topic><topic>Engineering</topic><topic>Feature extraction</topic><topic>Feature recognition</topic><topic>Linear prediction</topic><topic>N-Gram language models</topic><topic>Punjabi language</topic><topic>Signal,Image and Speech Processing</topic><topic>Social Sciences</topic><topic>Speech perception</topic><topic>Speech recognition</topic><topic>Voice recognition</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Guglani, Jyoti</creatorcontrib><creatorcontrib>Mishra, A. N.</creatorcontrib><collection>CrossRef</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><jtitle>International journal of speech technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Guglani, Jyoti</au><au>Mishra, A. N.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Continuous Punjabi speech recognition model based on Kaldi ASR toolkit</atitle><jtitle>International journal of speech technology</jtitle><stitle>Int J Speech Technol</stitle><date>2018-06-01</date><risdate>2018</risdate><volume>21</volume><issue>2</issue><spage>211</spage><epage>216</epage><pages>211-216</pages><issn>1381-2416</issn><eissn>1572-8110</eissn><abstract>In this paper, continuous Punjabi speech recognition model is presented using Kaldi toolkit. For speech recognition, the extraction of Mel frequency cepstral coefficients (MFCC) features and perceptual linear prediction (PLP) features were extracted from Punjabi continuous speech samples. The performance of automatic speech recognition (ASR) system for both monophone and triphone model i.e., tri1, tri2 and tri3 model using N-gram language model is reported. The performance of ASR system were computed in terms of word error rate (WER). A significant reduction in WER was observed using the tri phone model over mono phone model ASR .Also the performance of ASR using tri3 model is improved over tri2 model and the performance of tri2 model is improved over tri1 model ASR. Further, it was found that MFCC feature provides higher speech recognition accuracy than PLP features for continuous Punjabi speech.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10772-018-9497-6</doi><tpages>6</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1381-2416 |
ispartof | International journal of speech technology, 2018-06, Vol.21 (2), p.211-216 |
issn | 1381-2416 1572-8110 |
language | eng |
recordid | cdi_proquest_journals_2038767307 |
source | Springer Nature - Complete Springer Journals |
subjects | Artificial Intelligence Automatic speech recognition Continuous speech Engineering Feature extraction Feature recognition Linear prediction N-Gram language models Punjabi language Signal,Image and Speech Processing Social Sciences Speech perception Speech recognition Voice recognition |
title | Continuous Punjabi speech recognition model based on Kaldi ASR toolkit |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-25T09%3A34%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Continuous%20Punjabi%20speech%20recognition%20model%20based%20on%20Kaldi%20ASR%20toolkit&rft.jtitle=International%20journal%20of%20speech%20technology&rft.au=Guglani,%20Jyoti&rft.date=2018-06-01&rft.volume=21&rft.issue=2&rft.spage=211&rft.epage=216&rft.pages=211-216&rft.issn=1381-2416&rft.eissn=1572-8110&rft_id=info:doi/10.1007/s10772-018-9497-6&rft_dat=%3Cproquest_cross%3E2038767307%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2038767307&rft_id=info:pmid/&rfr_iscdi=true |