Mel-scaled Discrete Wavelet Transform and dynamic features for the Persian phoneme recognition

In this paper we use a feature vector consisting of the Mel Frequency Discrete Wavelet Coefficients to recognize spoken phonemes in the Persian language. The purpose of using wavelet in feature extraction is to benefit from its multi resolution analysis and localization property in time and frequenc...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Tavanaei, A., Manzuri, M. T., Sameti, H.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Discrete wavelet transforms Feature extraction Mel frequency cepstral coefficient mel-scaled wavelet transform MFCC phoneme recognition Speech Speech recognition wavelet transform
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	140
container_issue
container_start_page	138
container_title
container_volume
creator	Tavanaei, A. Manzuri, M. T. Sameti, H.
description	In this paper we use a feature vector consisting of the Mel Frequency Discrete Wavelet Coefficients to recognize spoken phonemes in the Persian language. The purpose of using wavelet in feature extraction is to benefit from its multi resolution analysis and localization property in time and frequency domains. The MFDWCs are obtained by applying the Discrete Wavelet Transform (DWT) to the Mel-scaled log filter bank energies of a speech frame. Feature vectors are used for the HMM-based phoneme recognition on a portion of the FarsDat Persian language database consisting of 35 hour recorded data for training and 15 hour for testing. We evaluate the performance of new features for clean speech and noisy speech and compare it with the Mel Frequency Cepstral Coefficients (MFCC). Experiments on a phone recognition task based on the MFDWC give better result than recognizers based on the MFCC features for both white noise and clean speech cases.
doi_str_mv	10.1109/AISP.2011.5960989
format	Conference Proceeding
fullrecord	<record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_5960989</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5960989</ieee_id><sourcerecordid>5960989</sourcerecordid><originalsourceid>FETCH-LOGICAL-i90t-3e86294c5b788fd84b2096ccb30231d389fe32bd912e36a98260e3d9e1790b3c3</originalsourceid><addsrcrecordid>eNo1kMFKAzEURSMiqLUfIG7yA1OTvGkmb1mq1kLFggV3lkzyxkZmMiUZhf69BevdXA4XzuIydivFREqB97Pl23qihJSTKWqBBs_YGCsjS1WWaKCcnrPrfwC4ZOOcv8QxWptKyyv28UJtkZ1tyfOHkF2igfi7_aGWBr5JNuamTx230XN_iLYLjjdkh-9EmR8XPuyIrynlYCPf7_pIHfFErv-MYQh9vGEXjW0zjU89Ypunx838uVi9Lpbz2aoIKIYCyGiFpZvWlTGNN2WtBGrnahAKpAeDDYGqPUpFoC0apQWBR5IVihocjNjdnzYQ0XafQmfTYXt6BH4BZ6dVGg</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Mel-scaled Discrete Wavelet Transform and dynamic features for the Persian phoneme recognition</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Tavanaei, A. ; Manzuri, M. T. ; Sameti, H.</creator><creatorcontrib>Tavanaei, A. ; Manzuri, M. T. ; Sameti, H.</creatorcontrib><description>In this paper we use a feature vector consisting of the Mel Frequency Discrete Wavelet Coefficients to recognize spoken phonemes in the Persian language. The purpose of using wavelet in feature extraction is to benefit from its multi resolution analysis and localization property in time and frequency domains. The MFDWCs are obtained by applying the Discrete Wavelet Transform (DWT) to the Mel-scaled log filter bank energies of a speech frame. Feature vectors are used for the HMM-based phoneme recognition on a portion of the FarsDat Persian language database consisting of 35 hour recorded data for training and 15 hour for testing. We evaluate the performance of new features for clean speech and noisy speech and compare it with the Mel Frequency Cepstral Coefficients (MFCC). Experiments on a phone recognition task based on the MFDWC give better result than recognizers based on the MFCC features for both white noise and clean speech cases.</description><identifier>ISBN: 1424498333</identifier><identifier>ISBN: 9781424498338</identifier><identifier>EISBN: 9781424498345</identifier><identifier>EISBN: 9781424498321</identifier><identifier>EISBN: 1424498325</identifier><identifier>EISBN: 1424498341</identifier><identifier>DOI: 10.1109/AISP.2011.5960989</identifier><language>eng</language><publisher>IEEE</publisher><subject>Discrete wavelet transforms ; Feature extraction ; Mel frequency cepstral coefficient ; mel-scaled wavelet transform ; MFCC ; phoneme recognition ; Speech ; Speech recognition ; wavelet transform</subject><ispartof>2011 International Symposium on Artificial Intelligence and Signal Processing (AISP), 2011, p.138-140</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5960989$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,778,782,787,788,2054,27914,54909</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5960989$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Tavanaei, A.</creatorcontrib><creatorcontrib>Manzuri, M. T.</creatorcontrib><creatorcontrib>Sameti, H.</creatorcontrib><title>Mel-scaled Discrete Wavelet Transform and dynamic features for the Persian phoneme recognition</title><title>2011 International Symposium on Artificial Intelligence and Signal Processing (AISP)</title><addtitle>AISP</addtitle><description>In this paper we use a feature vector consisting of the Mel Frequency Discrete Wavelet Coefficients to recognize spoken phonemes in the Persian language. The purpose of using wavelet in feature extraction is to benefit from its multi resolution analysis and localization property in time and frequency domains. The MFDWCs are obtained by applying the Discrete Wavelet Transform (DWT) to the Mel-scaled log filter bank energies of a speech frame. Feature vectors are used for the HMM-based phoneme recognition on a portion of the FarsDat Persian language database consisting of 35 hour recorded data for training and 15 hour for testing. We evaluate the performance of new features for clean speech and noisy speech and compare it with the Mel Frequency Cepstral Coefficients (MFCC). Experiments on a phone recognition task based on the MFDWC give better result than recognizers based on the MFCC features for both white noise and clean speech cases.</description><subject>Discrete wavelet transforms</subject><subject>Feature extraction</subject><subject>Mel frequency cepstral coefficient</subject><subject>mel-scaled wavelet transform</subject><subject>MFCC</subject><subject>phoneme recognition</subject><subject>Speech</subject><subject>Speech recognition</subject><subject>wavelet transform</subject><isbn>1424498333</isbn><isbn>9781424498338</isbn><isbn>9781424498345</isbn><isbn>9781424498321</isbn><isbn>1424498325</isbn><isbn>1424498341</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2011</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNo1kMFKAzEURSMiqLUfIG7yA1OTvGkmb1mq1kLFggV3lkzyxkZmMiUZhf69BevdXA4XzuIydivFREqB97Pl23qihJSTKWqBBs_YGCsjS1WWaKCcnrPrfwC4ZOOcv8QxWptKyyv28UJtkZ1tyfOHkF2igfi7_aGWBr5JNuamTx230XN_iLYLjjdkh-9EmR8XPuyIrynlYCPf7_pIHfFErv-MYQh9vGEXjW0zjU89Ypunx838uVi9Lpbz2aoIKIYCyGiFpZvWlTGNN2WtBGrnahAKpAeDDYGqPUpFoC0apQWBR5IVihocjNjdnzYQ0XafQmfTYXt6BH4BZ6dVGg</recordid><startdate>201106</startdate><enddate>201106</enddate><creator>Tavanaei, A.</creator><creator>Manzuri, M. T.</creator><creator>Sameti, H.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>201106</creationdate><title>Mel-scaled Discrete Wavelet Transform and dynamic features for the Persian phoneme recognition</title><author>Tavanaei, A. ; Manzuri, M. T. ; Sameti, H.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i90t-3e86294c5b788fd84b2096ccb30231d389fe32bd912e36a98260e3d9e1790b3c3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2011</creationdate><topic>Discrete wavelet transforms</topic><topic>Feature extraction</topic><topic>Mel frequency cepstral coefficient</topic><topic>mel-scaled wavelet transform</topic><topic>MFCC</topic><topic>phoneme recognition</topic><topic>Speech</topic><topic>Speech recognition</topic><topic>wavelet transform</topic><toplevel>online_resources</toplevel><creatorcontrib>Tavanaei, A.</creatorcontrib><creatorcontrib>Manzuri, M. T.</creatorcontrib><creatorcontrib>Sameti, H.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Tavanaei, A.</au><au>Manzuri, M. T.</au><au>Sameti, H.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Mel-scaled Discrete Wavelet Transform and dynamic features for the Persian phoneme recognition</atitle><btitle>2011 International Symposium on Artificial Intelligence and Signal Processing (AISP)</btitle><stitle>AISP</stitle><date>2011-06</date><risdate>2011</risdate><spage>138</spage><epage>140</epage><pages>138-140</pages><isbn>1424498333</isbn><isbn>9781424498338</isbn><eisbn>9781424498345</eisbn><eisbn>9781424498321</eisbn><eisbn>1424498325</eisbn><eisbn>1424498341</eisbn><abstract>In this paper we use a feature vector consisting of the Mel Frequency Discrete Wavelet Coefficients to recognize spoken phonemes in the Persian language. The purpose of using wavelet in feature extraction is to benefit from its multi resolution analysis and localization property in time and frequency domains. The MFDWCs are obtained by applying the Discrete Wavelet Transform (DWT) to the Mel-scaled log filter bank energies of a speech frame. Feature vectors are used for the HMM-based phoneme recognition on a portion of the FarsDat Persian language database consisting of 35 hour recorded data for training and 15 hour for testing. We evaluate the performance of new features for clean speech and noisy speech and compare it with the Mel Frequency Cepstral Coefficients (MFCC). Experiments on a phone recognition task based on the MFDWC give better result than recognizers based on the MFCC features for both white noise and clean speech cases.</abstract><pub>IEEE</pub><doi>10.1109/AISP.2011.5960989</doi><tpages>3</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISBN: 1424498333
ispartof	2011 International Symposium on Artificial Intelligence and Signal Processing (AISP), 2011, p.138-140
issn
language	eng
recordid	cdi_ieee_primary_5960989
source	IEEE Electronic Library (IEL) Conference Proceedings
subjects	Discrete wavelet transforms Feature extraction Mel frequency cepstral coefficient mel-scaled wavelet transform MFCC phoneme recognition Speech Speech recognition wavelet transform
title	Mel-scaled Discrete Wavelet Transform and dynamic features for the Persian phoneme recognition
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-15T10%3A03%3A33IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Mel-scaled%20Discrete%20Wavelet%20Transform%20and%20dynamic%20features%20for%20the%20Persian%20phoneme%20recognition&rft.btitle=2011%20International%20Symposium%20on%20Artificial%20Intelligence%20and%20Signal%20Processing%20(AISP)&rft.au=Tavanaei,%20A.&rft.date=2011-06&rft.spage=138&rft.epage=140&rft.pages=138-140&rft.isbn=1424498333&rft.isbn_list=9781424498338&rft_id=info:doi/10.1109/AISP.2011.5960989&rft_dat=%3Cieee_6IE%3E5960989%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9781424498345&rft.eisbn_list=9781424498321&rft.eisbn_list=1424498325&rft.eisbn_list=1424498341&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5960989&rfr_iscdi=true