Significance of the Modified Group Delay Feature in Speech Recognition

Spectral representation of speech is complete when both the Fourier transform magnitude and phase spectra are specified. In conventional speech recognition systems, features are generally derived from the short-time magnitude spectrum. Although the importance of Fourier transform phase in speech per...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on audio, speech, and language processing speech, and language processing, 2007-01, Vol.15 (1), p.190-202
Hauptverfasser: Hegde, R.M., Murthy, H.A., Gadde, V.R.R.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 202
container_issue 1
container_start_page 190
container_title IEEE transactions on audio, speech, and language processing
container_volume 15
creator Hegde, R.M.
Murthy, H.A.
Gadde, V.R.R.
description Spectral representation of speech is complete when both the Fourier transform magnitude and phase spectra are specified. In conventional speech recognition systems, features are generally derived from the short-time magnitude spectrum. Although the importance of Fourier transform phase in speech perception has been realized, few attempts have been made to extract features from it. This is primarily because the resonances of the speech signal which manifest as transitions in the phase spectrum are completely masked by the wrapping of the phase spectrum. Hence, an alternative to processing the Fourier transform phase, for extracting speech features, is to process the group delay function which can be directly computed from the speech signal. The group delay function has been used in earlier efforts, to extract pitch and formant information from the speech signal. In all these efforts, no attempt was made to extract features from the speech signal and use them for speech recognition applications. This is primarily because the group delay function fails to capture the short-time spectral structure of speech owing to zeros that are close to the unit circle in the z-plane and also due to pitch periodicity effects. In this paper, the group delay function is modified to overcome these effects. Cepstral features are extracted from the modified group delay function and are called the modified group delay feature (MODGDF). The MODGDF is used for three speech recognition tasks namely, speaker, language, and continuous-speech recognition. Based on the results of feature and performance evaluation, the significance of the MODGDF as a new feature for speech recognition is discussed
doi_str_mv 10.1109/TASL.2006.876858
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TASL_2006_876858</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>4032772</ieee_id><sourcerecordid>875061278</sourcerecordid><originalsourceid>FETCH-LOGICAL-c418t-a0058996029595443795d7bf5c13e1c1dd983873de2017528ee21928d583cb2c3</originalsourceid><addsrcrecordid>eNpdkE1LAzEURYMoWKt7wU0QxFVrPiaTZFmqrUJFsHUd0swbmzKd1GRm0X_vlJYKrl54OffyOAjdUjKklOinxWg-GzJC8qGSuRLqDPWoEGogNcvOT2-aX6KrlNaEZDzPaA9N5v679qV3tnaAQ4mbFeD3UHQrKPA0hnaLn6GyOzwB27QRsK_xfAvgVvgTXOjCjQ_1NboobZXg5jj76Gvyshi_DmYf07fxaDZwGVXNwBIilNY5YVpokWVcalHIZSkc5UAdLQqtuJK8AEaoFEwBMKqZKoTibskc76PHQ-82hp8WUmM2PjmoKltDaJNRUpCcMqk68v4fuQ5trLvjjKYy41xL0kHkALkYUopQmm30Gxt3hhKz12r2Ws1eqzlo7SIPx16bnK3K2Inz6S-nuJT7_j66O3AeAE7fGeFMSsZ_Ab7Yfb4</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>917433970</pqid></control><display><type>article</type><title>Significance of the Modified Group Delay Feature in Speech Recognition</title><source>IEEE Electronic Library (IEL)</source><creator>Hegde, R.M. ; Murthy, H.A. ; Gadde, V.R.R.</creator><creatorcontrib>Hegde, R.M. ; Murthy, H.A. ; Gadde, V.R.R.</creatorcontrib><description>Spectral representation of speech is complete when both the Fourier transform magnitude and phase spectra are specified. In conventional speech recognition systems, features are generally derived from the short-time magnitude spectrum. Although the importance of Fourier transform phase in speech perception has been realized, few attempts have been made to extract features from it. This is primarily because the resonances of the speech signal which manifest as transitions in the phase spectrum are completely masked by the wrapping of the phase spectrum. Hence, an alternative to processing the Fourier transform phase, for extracting speech features, is to process the group delay function which can be directly computed from the speech signal. The group delay function has been used in earlier efforts, to extract pitch and formant information from the speech signal. In all these efforts, no attempt was made to extract features from the speech signal and use them for speech recognition applications. This is primarily because the group delay function fails to capture the short-time spectral structure of speech owing to zeros that are close to the unit circle in the z-plane and also due to pitch periodicity effects. In this paper, the group delay function is modified to overcome these effects. Cepstral features are extracted from the modified group delay function and are called the modified group delay feature (MODGDF). The MODGDF is used for three speech recognition tasks namely, speaker, language, and continuous-speech recognition. Based on the results of feature and performance evaluation, the significance of the MODGDF as a new feature for speech recognition is discussed</description><identifier>ISSN: 1558-7916</identifier><identifier>ISSN: 2329-9290</identifier><identifier>EISSN: 1558-7924</identifier><identifier>EISSN: 2329-9304</identifier><identifier>DOI: 10.1109/TASL.2006.876858</identifier><identifier>CODEN: ITASD8</identifier><language>eng</language><publisher>Piscataway, NJ: IEEE</publisher><subject>Applied sciences ; Class separability ; Data mining ; Delay effects ; Exact sciences and technology ; Feature extraction ; feature selection ; Fourier transforms ; Gaussian mixture models (GMMs) ; Group delay ; group delay function ; hidden Markov models (HMMs) ; Information, signal and communications theory ; Mathematical analysis ; Mathematical models ; Miscellaneous ; Pattern recognition ; phase spectrum ; Phase transformations ; Resonance ; robustness ; Signal and communications theory ; Signal processing ; Signal representation. Spectral analysis ; Signal, noise ; Spectra ; Speech ; Speech coding ; Speech processing ; Speech recognition ; Telecommunications and information theory ; Voice recognition ; Wrapping</subject><ispartof>IEEE transactions on audio, speech, and language processing, 2007-01, Vol.15 (1), p.190-202</ispartof><rights>2007 INIST-CNRS</rights><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2007</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c418t-a0058996029595443795d7bf5c13e1c1dd983873de2017528ee21928d583cb2c3</citedby><cites>FETCH-LOGICAL-c418t-a0058996029595443795d7bf5c13e1c1dd983873de2017528ee21928d583cb2c3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/4032772$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>315,781,785,797,4025,27927,27928,27929,54762</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/4032772$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=18377174$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Hegde, R.M.</creatorcontrib><creatorcontrib>Murthy, H.A.</creatorcontrib><creatorcontrib>Gadde, V.R.R.</creatorcontrib><title>Significance of the Modified Group Delay Feature in Speech Recognition</title><title>IEEE transactions on audio, speech, and language processing</title><addtitle>TASL</addtitle><description>Spectral representation of speech is complete when both the Fourier transform magnitude and phase spectra are specified. In conventional speech recognition systems, features are generally derived from the short-time magnitude spectrum. Although the importance of Fourier transform phase in speech perception has been realized, few attempts have been made to extract features from it. This is primarily because the resonances of the speech signal which manifest as transitions in the phase spectrum are completely masked by the wrapping of the phase spectrum. Hence, an alternative to processing the Fourier transform phase, for extracting speech features, is to process the group delay function which can be directly computed from the speech signal. The group delay function has been used in earlier efforts, to extract pitch and formant information from the speech signal. In all these efforts, no attempt was made to extract features from the speech signal and use them for speech recognition applications. This is primarily because the group delay function fails to capture the short-time spectral structure of speech owing to zeros that are close to the unit circle in the z-plane and also due to pitch periodicity effects. In this paper, the group delay function is modified to overcome these effects. Cepstral features are extracted from the modified group delay function and are called the modified group delay feature (MODGDF). The MODGDF is used for three speech recognition tasks namely, speaker, language, and continuous-speech recognition. Based on the results of feature and performance evaluation, the significance of the MODGDF as a new feature for speech recognition is discussed</description><subject>Applied sciences</subject><subject>Class separability</subject><subject>Data mining</subject><subject>Delay effects</subject><subject>Exact sciences and technology</subject><subject>Feature extraction</subject><subject>feature selection</subject><subject>Fourier transforms</subject><subject>Gaussian mixture models (GMMs)</subject><subject>Group delay</subject><subject>group delay function</subject><subject>hidden Markov models (HMMs)</subject><subject>Information, signal and communications theory</subject><subject>Mathematical analysis</subject><subject>Mathematical models</subject><subject>Miscellaneous</subject><subject>Pattern recognition</subject><subject>phase spectrum</subject><subject>Phase transformations</subject><subject>Resonance</subject><subject>robustness</subject><subject>Signal and communications theory</subject><subject>Signal processing</subject><subject>Signal representation. Spectral analysis</subject><subject>Signal, noise</subject><subject>Spectra</subject><subject>Speech</subject><subject>Speech coding</subject><subject>Speech processing</subject><subject>Speech recognition</subject><subject>Telecommunications and information theory</subject><subject>Voice recognition</subject><subject>Wrapping</subject><issn>1558-7916</issn><issn>2329-9290</issn><issn>1558-7924</issn><issn>2329-9304</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2007</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkE1LAzEURYMoWKt7wU0QxFVrPiaTZFmqrUJFsHUd0swbmzKd1GRm0X_vlJYKrl54OffyOAjdUjKklOinxWg-GzJC8qGSuRLqDPWoEGogNcvOT2-aX6KrlNaEZDzPaA9N5v679qV3tnaAQ4mbFeD3UHQrKPA0hnaLn6GyOzwB27QRsK_xfAvgVvgTXOjCjQ_1NboobZXg5jj76Gvyshi_DmYf07fxaDZwGVXNwBIilNY5YVpokWVcalHIZSkc5UAdLQqtuJK8AEaoFEwBMKqZKoTibskc76PHQ-82hp8WUmM2PjmoKltDaJNRUpCcMqk68v4fuQ5trLvjjKYy41xL0kHkALkYUopQmm30Gxt3hhKz12r2Ws1eqzlo7SIPx16bnK3K2Inz6S-nuJT7_j66O3AeAE7fGeFMSsZ_Ab7Yfb4</recordid><startdate>200701</startdate><enddate>200701</enddate><creator>Hegde, R.M.</creator><creator>Murthy, H.A.</creator><creator>Gadde, V.R.R.</creator><general>IEEE</general><general>Institute of Electrical and Electronics Engineers</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>200701</creationdate><title>Significance of the Modified Group Delay Feature in Speech Recognition</title><author>Hegde, R.M. ; Murthy, H.A. ; Gadde, V.R.R.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c418t-a0058996029595443795d7bf5c13e1c1dd983873de2017528ee21928d583cb2c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2007</creationdate><topic>Applied sciences</topic><topic>Class separability</topic><topic>Data mining</topic><topic>Delay effects</topic><topic>Exact sciences and technology</topic><topic>Feature extraction</topic><topic>feature selection</topic><topic>Fourier transforms</topic><topic>Gaussian mixture models (GMMs)</topic><topic>Group delay</topic><topic>group delay function</topic><topic>hidden Markov models (HMMs)</topic><topic>Information, signal and communications theory</topic><topic>Mathematical analysis</topic><topic>Mathematical models</topic><topic>Miscellaneous</topic><topic>Pattern recognition</topic><topic>phase spectrum</topic><topic>Phase transformations</topic><topic>Resonance</topic><topic>robustness</topic><topic>Signal and communications theory</topic><topic>Signal processing</topic><topic>Signal representation. Spectral analysis</topic><topic>Signal, noise</topic><topic>Spectra</topic><topic>Speech</topic><topic>Speech coding</topic><topic>Speech processing</topic><topic>Speech recognition</topic><topic>Telecommunications and information theory</topic><topic>Voice recognition</topic><topic>Wrapping</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hegde, R.M.</creatorcontrib><creatorcontrib>Murthy, H.A.</creatorcontrib><creatorcontrib>Gadde, V.R.R.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on audio, speech, and language processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Hegde, R.M.</au><au>Murthy, H.A.</au><au>Gadde, V.R.R.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Significance of the Modified Group Delay Feature in Speech Recognition</atitle><jtitle>IEEE transactions on audio, speech, and language processing</jtitle><stitle>TASL</stitle><date>2007-01</date><risdate>2007</risdate><volume>15</volume><issue>1</issue><spage>190</spage><epage>202</epage><pages>190-202</pages><issn>1558-7916</issn><issn>2329-9290</issn><eissn>1558-7924</eissn><eissn>2329-9304</eissn><coden>ITASD8</coden><abstract>Spectral representation of speech is complete when both the Fourier transform magnitude and phase spectra are specified. In conventional speech recognition systems, features are generally derived from the short-time magnitude spectrum. Although the importance of Fourier transform phase in speech perception has been realized, few attempts have been made to extract features from it. This is primarily because the resonances of the speech signal which manifest as transitions in the phase spectrum are completely masked by the wrapping of the phase spectrum. Hence, an alternative to processing the Fourier transform phase, for extracting speech features, is to process the group delay function which can be directly computed from the speech signal. The group delay function has been used in earlier efforts, to extract pitch and formant information from the speech signal. In all these efforts, no attempt was made to extract features from the speech signal and use them for speech recognition applications. This is primarily because the group delay function fails to capture the short-time spectral structure of speech owing to zeros that are close to the unit circle in the z-plane and also due to pitch periodicity effects. In this paper, the group delay function is modified to overcome these effects. Cepstral features are extracted from the modified group delay function and are called the modified group delay feature (MODGDF). The MODGDF is used for three speech recognition tasks namely, speaker, language, and continuous-speech recognition. Based on the results of feature and performance evaluation, the significance of the MODGDF as a new feature for speech recognition is discussed</abstract><cop>Piscataway, NJ</cop><pub>IEEE</pub><doi>10.1109/TASL.2006.876858</doi><tpages>13</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1558-7916
ispartof IEEE transactions on audio, speech, and language processing, 2007-01, Vol.15 (1), p.190-202
issn 1558-7916
2329-9290
1558-7924
2329-9304
language eng
recordid cdi_crossref_primary_10_1109_TASL_2006_876858
source IEEE Electronic Library (IEL)
subjects Applied sciences
Class separability
Data mining
Delay effects
Exact sciences and technology
Feature extraction
feature selection
Fourier transforms
Gaussian mixture models (GMMs)
Group delay
group delay function
hidden Markov models (HMMs)
Information, signal and communications theory
Mathematical analysis
Mathematical models
Miscellaneous
Pattern recognition
phase spectrum
Phase transformations
Resonance
robustness
Signal and communications theory
Signal processing
Signal representation. Spectral analysis
Signal, noise
Spectra
Speech
Speech coding
Speech processing
Speech recognition
Telecommunications and information theory
Voice recognition
Wrapping
title Significance of the Modified Group Delay Feature in Speech Recognition
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-17T00%3A50%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Significance%20of%20the%20Modified%20Group%20Delay%20Feature%20in%20Speech%20Recognition&rft.jtitle=IEEE%20transactions%20on%20audio,%20speech,%20and%20language%20processing&rft.au=Hegde,%20R.M.&rft.date=2007-01&rft.volume=15&rft.issue=1&rft.spage=190&rft.epage=202&rft.pages=190-202&rft.issn=1558-7916&rft.eissn=1558-7924&rft.coden=ITASD8&rft_id=info:doi/10.1109/TASL.2006.876858&rft_dat=%3Cproquest_RIE%3E875061278%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=917433970&rft_id=info:pmid/&rft_ieee_id=4032772&rfr_iscdi=true