Structural design of hidden Markov model speech recognizer using multivalued phonetic features : comparison with segmental speech units

A novel approach to speech recognition, on the basis of a multidimensional multivalued phonetic-feature description of speech signals, is presented and evaluated. The hidden Markov model (HMM) framework is used to provide the recognition algorithm, which assumes that the underlying Markov chain trac...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Journal of the Acoustical Society of America 1992-12, Vol.92 (6), p.3058-3067
Hauptverfasser: DENG, L, ERLER, K
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 3067
container_issue 6
container_start_page 3058
container_title The Journal of the Acoustical Society of America
container_volume 92
creator DENG, L
ERLER, K
description A novel approach to speech recognition, on the basis of a multidimensional multivalued phonetic-feature description of speech signals, is presented and evaluated. The hidden Markov model (HMM) framework is used to provide the recognition algorithm, which assumes that the underlying Markov chain tracks the temporal evolution of the features. It is shown that this approach can naturally accommodate such coarticulatory effects as feature spreading and formant transition in the functionality of the recognizer, and can provide a high degree of acoustic data sharing that makes effective use of training data. Use of phonetic features as the basic speech units creates a framework where the Markov model's state topology in the recognizer can be designed with guidance of detailed speech knowledge. Details of such a design for a stop consonant-vowel vocabulary are described. Experimental results on the task of speaker-dependent stop consonant discrimination, evaluated from speech data from a total of ten male and five female speakers, demonstrate effectiveness of this feature-based recognizer. Over the 15 speakers, the error rates were shown to be reduced by 23%, 37%, 42%, and 38%, respectively, compared with the conventional HMM-based recognition methods using words, phonemes, allophones, and microsegments as the primary speech units.
doi_str_mv 10.1121/1.404202
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_73429837</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>73429837</sourcerecordid><originalsourceid>FETCH-LOGICAL-c336t-5cf94db5d29bad874e42a7f96875b635f04c1ee8d2ba9bd54e4596d9e74b298f3</originalsourceid><addsrcrecordid>eNpFkM1u1DAURi0EaqcFiRdA8gJVbNL6N4m7q6pSkIpYAOvIsa9nTBM79U9ReQFem6AZldXV1Xd0Fgeht5ScU8roBT0XRDDCXqANlYw0vWTiJdoQQmgjVNseo5Ocf66v7Lk6QkdUdIIxukF_vpVUTalJT9hC9tuAo8M7by0E_EWn-_iI52hhwnkBMDucwMRt8L8h4Zp92OK5TsU_6qmCxcsuBijeYAd6dULGl9jEedHJ5xjwL192OMN2hlD0s7EGX_Jr9MrpKcObwz1FPz7efL_-1Nx9vf18fXXXGM7b0kjjlLCjtEyN2vadAMF051Tbd3JsuXREGArQWzZqNVq57lK1VkEnRqZ6x0_R2d67pPhQIZdh9tnANOkAseah42LleLeCH_agSTHnBG5Ykp91ehooGf41H-iwb76i7w7OOs5g_4P7yOv-_rDrbPTkkg7G52dMSCF53_O_e0SLfQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>73429837</pqid></control><display><type>article</type><title>Structural design of hidden Markov model speech recognizer using multivalued phonetic features : comparison with segmental speech units</title><source>MEDLINE</source><source>AIP Acoustical Society of America</source><creator>DENG, L ; ERLER, K</creator><creatorcontrib>DENG, L ; ERLER, K</creatorcontrib><description>A novel approach to speech recognition, on the basis of a multidimensional multivalued phonetic-feature description of speech signals, is presented and evaluated. The hidden Markov model (HMM) framework is used to provide the recognition algorithm, which assumes that the underlying Markov chain tracks the temporal evolution of the features. It is shown that this approach can naturally accommodate such coarticulatory effects as feature spreading and formant transition in the functionality of the recognizer, and can provide a high degree of acoustic data sharing that makes effective use of training data. Use of phonetic features as the basic speech units creates a framework where the Markov model's state topology in the recognizer can be designed with guidance of detailed speech knowledge. Details of such a design for a stop consonant-vowel vocabulary are described. Experimental results on the task of speaker-dependent stop consonant discrimination, evaluated from speech data from a total of ten male and five female speakers, demonstrate effectiveness of this feature-based recognizer. Over the 15 speakers, the error rates were shown to be reduced by 23%, 37%, 42%, and 38%, respectively, compared with the conventional HMM-based recognition methods using words, phonemes, allophones, and microsegments as the primary speech units.</description><identifier>ISSN: 0001-4966</identifier><identifier>EISSN: 1520-8524</identifier><identifier>DOI: 10.1121/1.404202</identifier><identifier>PMID: 1474221</identifier><identifier>CODEN: JASMAN</identifier><language>eng</language><publisher>Woodbury, NY: Acoustical Society of America</publisher><subject>Applied sciences ; Artificial intelligence ; Communication ; Computer science; control theory; systems ; Exact sciences and technology ; Female ; Humans ; Male ; Markov Chains ; Models, Theoretical ; Phonetics ; Speech Acoustics ; Speech and sound recognition and synthesis. Linguistics ; Speech Perception ; Vocabulary</subject><ispartof>The Journal of the Acoustical Society of America, 1992-12, Vol.92 (6), p.3058-3067</ispartof><rights>1993 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c336t-5cf94db5d29bad874e42a7f96875b635f04c1ee8d2ba9bd54e4596d9e74b298f3</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>207,314,778,782,27911,27912</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=4545388$$DView record in Pascal Francis$$Hfree_for_read</backlink><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/1474221$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>DENG, L</creatorcontrib><creatorcontrib>ERLER, K</creatorcontrib><title>Structural design of hidden Markov model speech recognizer using multivalued phonetic features : comparison with segmental speech units</title><title>The Journal of the Acoustical Society of America</title><addtitle>J Acoust Soc Am</addtitle><description>A novel approach to speech recognition, on the basis of a multidimensional multivalued phonetic-feature description of speech signals, is presented and evaluated. The hidden Markov model (HMM) framework is used to provide the recognition algorithm, which assumes that the underlying Markov chain tracks the temporal evolution of the features. It is shown that this approach can naturally accommodate such coarticulatory effects as feature spreading and formant transition in the functionality of the recognizer, and can provide a high degree of acoustic data sharing that makes effective use of training data. Use of phonetic features as the basic speech units creates a framework where the Markov model's state topology in the recognizer can be designed with guidance of detailed speech knowledge. Details of such a design for a stop consonant-vowel vocabulary are described. Experimental results on the task of speaker-dependent stop consonant discrimination, evaluated from speech data from a total of ten male and five female speakers, demonstrate effectiveness of this feature-based recognizer. Over the 15 speakers, the error rates were shown to be reduced by 23%, 37%, 42%, and 38%, respectively, compared with the conventional HMM-based recognition methods using words, phonemes, allophones, and microsegments as the primary speech units.</description><subject>Applied sciences</subject><subject>Artificial intelligence</subject><subject>Communication</subject><subject>Computer science; control theory; systems</subject><subject>Exact sciences and technology</subject><subject>Female</subject><subject>Humans</subject><subject>Male</subject><subject>Markov Chains</subject><subject>Models, Theoretical</subject><subject>Phonetics</subject><subject>Speech Acoustics</subject><subject>Speech and sound recognition and synthesis. Linguistics</subject><subject>Speech Perception</subject><subject>Vocabulary</subject><issn>0001-4966</issn><issn>1520-8524</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>1992</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNpFkM1u1DAURi0EaqcFiRdA8gJVbNL6N4m7q6pSkIpYAOvIsa9nTBM79U9ReQFem6AZldXV1Xd0Fgeht5ScU8roBT0XRDDCXqANlYw0vWTiJdoQQmgjVNseo5Ocf66v7Lk6QkdUdIIxukF_vpVUTalJT9hC9tuAo8M7by0E_EWn-_iI52hhwnkBMDucwMRt8L8h4Zp92OK5TsU_6qmCxcsuBijeYAd6dULGl9jEedHJ5xjwL192OMN2hlD0s7EGX_Jr9MrpKcObwz1FPz7efL_-1Nx9vf18fXXXGM7b0kjjlLCjtEyN2vadAMF051Tbd3JsuXREGArQWzZqNVq57lK1VkEnRqZ6x0_R2d67pPhQIZdh9tnANOkAseah42LleLeCH_agSTHnBG5Ykp91ehooGf41H-iwb76i7w7OOs5g_4P7yOv-_rDrbPTkkg7G52dMSCF53_O_e0SLfQ</recordid><startdate>19921201</startdate><enddate>19921201</enddate><creator>DENG, L</creator><creator>ERLER, K</creator><general>Acoustical Society of America</general><general>American Institute of Physics</general><scope>IQODW</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>8BM</scope></search><sort><creationdate>19921201</creationdate><title>Structural design of hidden Markov model speech recognizer using multivalued phonetic features : comparison with segmental speech units</title><author>DENG, L ; ERLER, K</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c336t-5cf94db5d29bad874e42a7f96875b635f04c1ee8d2ba9bd54e4596d9e74b298f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>1992</creationdate><topic>Applied sciences</topic><topic>Artificial intelligence</topic><topic>Communication</topic><topic>Computer science; control theory; systems</topic><topic>Exact sciences and technology</topic><topic>Female</topic><topic>Humans</topic><topic>Male</topic><topic>Markov Chains</topic><topic>Models, Theoretical</topic><topic>Phonetics</topic><topic>Speech Acoustics</topic><topic>Speech and sound recognition and synthesis. Linguistics</topic><topic>Speech Perception</topic><topic>Vocabulary</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>DENG, L</creatorcontrib><creatorcontrib>ERLER, K</creatorcontrib><collection>Pascal-Francis</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>ComDisDome</collection><jtitle>The Journal of the Acoustical Society of America</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>DENG, L</au><au>ERLER, K</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Structural design of hidden Markov model speech recognizer using multivalued phonetic features : comparison with segmental speech units</atitle><jtitle>The Journal of the Acoustical Society of America</jtitle><addtitle>J Acoust Soc Am</addtitle><date>1992-12-01</date><risdate>1992</risdate><volume>92</volume><issue>6</issue><spage>3058</spage><epage>3067</epage><pages>3058-3067</pages><issn>0001-4966</issn><eissn>1520-8524</eissn><coden>JASMAN</coden><abstract>A novel approach to speech recognition, on the basis of a multidimensional multivalued phonetic-feature description of speech signals, is presented and evaluated. The hidden Markov model (HMM) framework is used to provide the recognition algorithm, which assumes that the underlying Markov chain tracks the temporal evolution of the features. It is shown that this approach can naturally accommodate such coarticulatory effects as feature spreading and formant transition in the functionality of the recognizer, and can provide a high degree of acoustic data sharing that makes effective use of training data. Use of phonetic features as the basic speech units creates a framework where the Markov model's state topology in the recognizer can be designed with guidance of detailed speech knowledge. Details of such a design for a stop consonant-vowel vocabulary are described. Experimental results on the task of speaker-dependent stop consonant discrimination, evaluated from speech data from a total of ten male and five female speakers, demonstrate effectiveness of this feature-based recognizer. Over the 15 speakers, the error rates were shown to be reduced by 23%, 37%, 42%, and 38%, respectively, compared with the conventional HMM-based recognition methods using words, phonemes, allophones, and microsegments as the primary speech units.</abstract><cop>Woodbury, NY</cop><pub>Acoustical Society of America</pub><pmid>1474221</pmid><doi>10.1121/1.404202</doi><tpages>10</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0001-4966
ispartof The Journal of the Acoustical Society of America, 1992-12, Vol.92 (6), p.3058-3067
issn 0001-4966
1520-8524
language eng
recordid cdi_proquest_miscellaneous_73429837
source MEDLINE; AIP Acoustical Society of America
subjects Applied sciences
Artificial intelligence
Communication
Computer science
control theory
systems
Exact sciences and technology
Female
Humans
Male
Markov Chains
Models, Theoretical
Phonetics
Speech Acoustics
Speech and sound recognition and synthesis. Linguistics
Speech Perception
Vocabulary
title Structural design of hidden Markov model speech recognizer using multivalued phonetic features : comparison with segmental speech units
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T01%3A48%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Structural%20design%20of%20hidden%20Markov%20model%20speech%20recognizer%20using%20multivalued%20phonetic%20features%20:%20comparison%20with%20segmental%20speech%20units&rft.jtitle=The%20Journal%20of%20the%20Acoustical%20Society%20of%20America&rft.au=DENG,%20L&rft.date=1992-12-01&rft.volume=92&rft.issue=6&rft.spage=3058&rft.epage=3067&rft.pages=3058-3067&rft.issn=0001-4966&rft.eissn=1520-8524&rft.coden=JASMAN&rft_id=info:doi/10.1121/1.404202&rft_dat=%3Cproquest_cross%3E73429837%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=73429837&rft_id=info:pmid/1474221&rfr_iscdi=true