The importance of cepstral parameter correlations in speech recognition

In this work we demonstrate that explicit modeling of correlations between spectral parameters in speech recognition improves speech models both in terms of their descriptive power (higher likelihoods) and classification accuracy. Most large-vocabulary speech recognition systems are based on some fo...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computer speech & language 1994-07, Vol.8 (3), p.223-232
1. Verfasser:	Ljolje, Andrej
Format:	Artikel
Sprache:	eng
Schlagworte:	Applied sciences Artificial intelligence Computer science control theory systems Exact sciences and technology Speech and sound recognition and synthesis. Linguistics
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	232
container_issue	3
container_start_page	223
container_title	Computer speech & language
container_volume	8
creator	Ljolje, Andrej
description	In this work we demonstrate that explicit modeling of correlations between spectral parameters in speech recognition improves speech models both in terms of their descriptive power (higher likelihoods) and classification accuracy. Most large-vocabulary speech recognition systems are based on some form of hidden Markov models (HMMs) modeling sub-word speech segments. Most of the time speech segments are represented using short term spectra. In this work we employ three-state left-to-right phone models and LPC cepstral parameters including their first and second order time differentials. We investigate the importance of modeling correlations between cepstral parameters for high accuracy phone recognition. Several different types of distributions for each HMM state are compared. The simplest uses a single multivariate Gaussian distribution with a full covariance matrix. The next uses a weighted mixture of multivariate Gaussian distributions with diagonal covariances. It uses implicit rather than explicit modeling of parameter correlations. The most elaborate model employs a mixture of Gaussian distributions, just like the previous model, but in addition it uses a parameter space rotation which is specific to a given state in an HMM. It thus explicitly models parameter correlations in exactly the same way as the simplest model which uses a single distribution per state. The highest phone accuracy on the DARPA Resource Management task Feb 89 test set is obtained using the most elaborate model, with mixtures and space rotation - 82·4% phone accuracy. The next best result was achieved using single distributions, which also explicitly model parameter correlations, with 80·8% phone accuracy. The worst result was obtained using distributions which only implicitly model parameter correlations, achieving 78·7% phone accuracy. These results clearly demonstrate the importance of explicitly modeling parameter correlations for improving speech recognition performance.
doi_str_mv	10.1006/csla.1994.1011
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_85577725</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0885230884710114</els_id><sourcerecordid>85577725</sourcerecordid><originalsourceid>FETCH-LOGICAL-c405t-edbc3a4c6a0da8bd45e7c6c560c7db968cc60c2769cab78f8293cd4c5e7eabbc3</originalsourceid><addsrcrecordid>eNqNkctLxDAQh4MouD6unguKt65J8-xRxBcIXtZzSKdTN9JtatIV_O9NWfEgCJ4yCd9vhnxDyBmjS0apuoLUuyWra5GvjO2RBaO1LA1XfJ8sqDGyrDg1h-QopTeaA1LoBblfrbHwmzHEyQ2ARegKwDFN0fXF6KLb4ISxgBAj9m7yYUiFH4o0IsK6iAjhdfDz8wk56Fyf8PT7PCYvd7erm4fy6fn-8eb6qQRB5VRi2wB3ApSjrTNNKyRqUCAVBd02tTIAuay0qsE12nSmqjm0AjKGrsnZY3K56zvG8L7FNNmNT4B97wYM22SNlFrrSv4D5JXgfAbPf4FvYRuH_AnLOFVKcVOzTC13FMSQUsTOjtFvXPy0jNpZv53121m_nfXnwMV3W5fA9V3Men36SQmmtdAqY2aHYZb24THaBB7zJlqf7U62Df6vCV--GZmk</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1306663891</pqid></control><display><type>article</type><title>The importance of cepstral parameter correlations in speech recognition</title><source>Elsevier ScienceDirect Journals</source><source>Periodicals Index Online</source><creator>Ljolje, Andrej</creator><creatorcontrib>Ljolje, Andrej</creatorcontrib><description>In this work we demonstrate that explicit modeling of correlations between spectral parameters in speech recognition improves speech models both in terms of their descriptive power (higher likelihoods) and classification accuracy. Most large-vocabulary speech recognition systems are based on some form of hidden Markov models (HMMs) modeling sub-word speech segments. Most of the time speech segments are represented using short term spectra. In this work we employ three-state left-to-right phone models and LPC cepstral parameters including their first and second order time differentials. We investigate the importance of modeling correlations between cepstral parameters for high accuracy phone recognition. Several different types of distributions for each HMM state are compared. The simplest uses a single multivariate Gaussian distribution with a full covariance matrix. The next uses a weighted mixture of multivariate Gaussian distributions with diagonal covariances. It uses implicit rather than explicit modeling of parameter correlations. The most elaborate model employs a mixture of Gaussian distributions, just like the previous model, but in addition it uses a parameter space rotation which is specific to a given state in an HMM. It thus explicitly models parameter correlations in exactly the same way as the simplest model which uses a single distribution per state. The highest phone accuracy on the DARPA Resource Management task Feb 89 test set is obtained using the most elaborate model, with mixtures and space rotation - 82·4% phone accuracy. The next best result was achieved using single distributions, which also explicitly model parameter correlations, with 80·8% phone accuracy. The worst result was obtained using distributions which only implicitly model parameter correlations, achieving 78·7% phone accuracy. These results clearly demonstrate the importance of explicitly modeling parameter correlations for improving speech recognition performance.</description><identifier>ISSN: 0885-2308</identifier><identifier>EISSN: 1095-8363</identifier><identifier>DOI: 10.1006/csla.1994.1011</identifier><identifier>CODEN: CSPLEO</identifier><language>eng</language><publisher>Oxford: Elsevier Ltd</publisher><subject>Applied sciences ; Artificial intelligence ; Computer science; control theory; systems ; Exact sciences and technology ; Speech and sound recognition and synthesis. Linguistics</subject><ispartof>Computer speech & language, 1994-07, Vol.8 (3), p.223-232</ispartof><rights>1994 Academic Press</rights><rights>1994 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c405t-edbc3a4c6a0da8bd45e7c6c560c7db968cc60c2769cab78f8293cd4c5e7eabbc3</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0885230884710114$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3536,27847,27902,27903,65308</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=4177476$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Ljolje, Andrej</creatorcontrib><title>The importance of cepstral parameter correlations in speech recognition</title><title>Computer speech & language</title><description>In this work we demonstrate that explicit modeling of correlations between spectral parameters in speech recognition improves speech models both in terms of their descriptive power (higher likelihoods) and classification accuracy. Most large-vocabulary speech recognition systems are based on some form of hidden Markov models (HMMs) modeling sub-word speech segments. Most of the time speech segments are represented using short term spectra. In this work we employ three-state left-to-right phone models and LPC cepstral parameters including their first and second order time differentials. We investigate the importance of modeling correlations between cepstral parameters for high accuracy phone recognition. Several different types of distributions for each HMM state are compared. The simplest uses a single multivariate Gaussian distribution with a full covariance matrix. The next uses a weighted mixture of multivariate Gaussian distributions with diagonal covariances. It uses implicit rather than explicit modeling of parameter correlations. The most elaborate model employs a mixture of Gaussian distributions, just like the previous model, but in addition it uses a parameter space rotation which is specific to a given state in an HMM. It thus explicitly models parameter correlations in exactly the same way as the simplest model which uses a single distribution per state. The highest phone accuracy on the DARPA Resource Management task Feb 89 test set is obtained using the most elaborate model, with mixtures and space rotation - 82·4% phone accuracy. The next best result was achieved using single distributions, which also explicitly model parameter correlations, with 80·8% phone accuracy. The worst result was obtained using distributions which only implicitly model parameter correlations, achieving 78·7% phone accuracy. These results clearly demonstrate the importance of explicitly modeling parameter correlations for improving speech recognition performance.</description><subject>Applied sciences</subject><subject>Artificial intelligence</subject><subject>Computer science; control theory; systems</subject><subject>Exact sciences and technology</subject><subject>Speech and sound recognition and synthesis. Linguistics</subject><issn>0885-2308</issn><issn>1095-8363</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>1994</creationdate><recordtype>article</recordtype><sourceid>K30</sourceid><recordid>eNqNkctLxDAQh4MouD6unguKt65J8-xRxBcIXtZzSKdTN9JtatIV_O9NWfEgCJ4yCd9vhnxDyBmjS0apuoLUuyWra5GvjO2RBaO1LA1XfJ8sqDGyrDg1h-QopTeaA1LoBblfrbHwmzHEyQ2ARegKwDFN0fXF6KLb4ISxgBAj9m7yYUiFH4o0IsK6iAjhdfDz8wk56Fyf8PT7PCYvd7erm4fy6fn-8eb6qQRB5VRi2wB3ApSjrTNNKyRqUCAVBd02tTIAuay0qsE12nSmqjm0AjKGrsnZY3K56zvG8L7FNNmNT4B97wYM22SNlFrrSv4D5JXgfAbPf4FvYRuH_AnLOFVKcVOzTC13FMSQUsTOjtFvXPy0jNpZv53121m_nfXnwMV3W5fA9V3Men36SQmmtdAqY2aHYZb24THaBB7zJlqf7U62Df6vCV--GZmk</recordid><startdate>19940701</startdate><enddate>19940701</enddate><creator>Ljolje, Andrej</creator><general>Elsevier Ltd</general><general>Elsevier</general><general>Academic Press</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>HVZBN</scope><scope>K30</scope><scope>PAAUG</scope><scope>PAWHS</scope><scope>PAWZZ</scope><scope>PAXOH</scope><scope>PBHAV</scope><scope>PBQSW</scope><scope>PBYQZ</scope><scope>PCIWU</scope><scope>PCMID</scope><scope>PCZJX</scope><scope>PDGRG</scope><scope>PDWWI</scope><scope>PETMR</scope><scope>PFVGT</scope><scope>PGXDX</scope><scope>PIHIL</scope><scope>PISVA</scope><scope>PJCTQ</scope><scope>PJTMS</scope><scope>PLCHJ</scope><scope>PMHAD</scope><scope>PNQDJ</scope><scope>POUND</scope><scope>PPLAD</scope><scope>PQAPC</scope><scope>PQCAN</scope><scope>PQCMW</scope><scope>PQEME</scope><scope>PQHKH</scope><scope>PQMID</scope><scope>PQNCT</scope><scope>PQNET</scope><scope>PQSCT</scope><scope>PQSET</scope><scope>PSVJG</scope><scope>PVMQY</scope><scope>PZGFC</scope><scope>8BM</scope><scope>7T9</scope></search><sort><creationdate>19940701</creationdate><title>The importance of cepstral parameter correlations in speech recognition</title><author>Ljolje, Andrej</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c405t-edbc3a4c6a0da8bd45e7c6c560c7db968cc60c2769cab78f8293cd4c5e7eabbc3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>1994</creationdate><topic>Applied sciences</topic><topic>Artificial intelligence</topic><topic>Computer science; control theory; systems</topic><topic>Exact sciences and technology</topic><topic>Speech and sound recognition and synthesis. Linguistics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ljolje, Andrej</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Periodicals Index Online Segment 24</collection><collection>Periodicals Index Online</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - West</collection><collection>Primary Sources Access (Plan D) - International</collection><collection>Primary Sources Access & Build (Plan A) - MEA</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - Midwest</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - Northeast</collection><collection>Primary Sources Access (Plan D) - Southeast</collection><collection>Primary Sources Access (Plan D) - North Central</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - Southeast</collection><collection>Primary Sources Access (Plan D) - South Central</collection><collection>Primary Sources Access & Build (Plan A) - UK / I</collection><collection>Primary Sources Access (Plan D) - Canada</collection><collection>Primary Sources Access (Plan D) - EMEALA</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - North Central</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - South Central</collection><collection>Primary Sources Access & Build (Plan A) - International</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - International</collection><collection>Primary Sources Access (Plan D) - West</collection><collection>Periodicals Index Online Segments 1-50</collection><collection>Primary Sources Access (Plan D) - APAC</collection><collection>Primary Sources Access (Plan D) - Midwest</collection><collection>Primary Sources Access (Plan D) - MEA</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - Canada</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - UK / I</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - EMEALA</collection><collection>Primary Sources Access & Build (Plan A) - APAC</collection><collection>Primary Sources Access & Build (Plan A) - Canada</collection><collection>Primary Sources Access & Build (Plan A) - West</collection><collection>Primary Sources Access & Build (Plan A) - EMEALA</collection><collection>Primary Sources Access (Plan D) - Northeast</collection><collection>Primary Sources Access & Build (Plan A) - Midwest</collection><collection>Primary Sources Access & Build (Plan A) - North Central</collection><collection>Primary Sources Access & Build (Plan A) - Northeast</collection><collection>Primary Sources Access & Build (Plan A) - South Central</collection><collection>Primary Sources Access & Build (Plan A) - Southeast</collection><collection>Primary Sources Access (Plan D) - UK / I</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - APAC</collection><collection>Primary Sources Access—Foundation Edition (Plan E) - MEA</collection><collection>ComDisDome</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><jtitle>Computer speech & language</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ljolje, Andrej</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>The importance of cepstral parameter correlations in speech recognition</atitle><jtitle>Computer speech & language</jtitle><date>1994-07-01</date><risdate>1994</risdate><volume>8</volume><issue>3</issue><spage>223</spage><epage>232</epage><pages>223-232</pages><issn>0885-2308</issn><eissn>1095-8363</eissn><coden>CSPLEO</coden><abstract>In this work we demonstrate that explicit modeling of correlations between spectral parameters in speech recognition improves speech models both in terms of their descriptive power (higher likelihoods) and classification accuracy. Most large-vocabulary speech recognition systems are based on some form of hidden Markov models (HMMs) modeling sub-word speech segments. Most of the time speech segments are represented using short term spectra. In this work we employ three-state left-to-right phone models and LPC cepstral parameters including their first and second order time differentials. We investigate the importance of modeling correlations between cepstral parameters for high accuracy phone recognition. Several different types of distributions for each HMM state are compared. The simplest uses a single multivariate Gaussian distribution with a full covariance matrix. The next uses a weighted mixture of multivariate Gaussian distributions with diagonal covariances. It uses implicit rather than explicit modeling of parameter correlations. The most elaborate model employs a mixture of Gaussian distributions, just like the previous model, but in addition it uses a parameter space rotation which is specific to a given state in an HMM. It thus explicitly models parameter correlations in exactly the same way as the simplest model which uses a single distribution per state. The highest phone accuracy on the DARPA Resource Management task Feb 89 test set is obtained using the most elaborate model, with mixtures and space rotation - 82·4% phone accuracy. The next best result was achieved using single distributions, which also explicitly model parameter correlations, with 80·8% phone accuracy. The worst result was obtained using distributions which only implicitly model parameter correlations, achieving 78·7% phone accuracy. These results clearly demonstrate the importance of explicitly modeling parameter correlations for improving speech recognition performance.</abstract><cop>Oxford</cop><pub>Elsevier Ltd</pub><doi>10.1006/csla.1994.1011</doi><tpages>10</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0885-2308
ispartof	Computer speech & language, 1994-07, Vol.8 (3), p.223-232
issn	0885-2308 1095-8363
language	eng
recordid	cdi_proquest_miscellaneous_85577725
source	Elsevier ScienceDirect Journals; Periodicals Index Online
subjects	Applied sciences Artificial intelligence Computer science control theory systems Exact sciences and technology Speech and sound recognition and synthesis. Linguistics
title	The importance of cepstral parameter correlations in speech recognition
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T08%3A24%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=The%20importance%20of%20cepstral%20parameter%20correlations%20in%20speech%20recognition&rft.jtitle=Computer%20speech%20&%20language&rft.au=Ljolje,%20Andrej&rft.date=1994-07-01&rft.volume=8&rft.issue=3&rft.spage=223&rft.epage=232&rft.pages=223-232&rft.issn=0885-2308&rft.eissn=1095-8363&rft.coden=CSPLEO&rft_id=info:doi/10.1006/csla.1994.1011&rft_dat=%3Cproquest_cross%3E85577725%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1306663891&rft_id=info:pmid/&rft_els_id=S0885230884710114&rfr_iscdi=true