Joint Source-Filter Optimization for Accurate Vocal Tract Estimation Using Differential Evolution

In this work, we present a joint source-filter optimization approach for separating voiced speech into vocal tract (VT) and voice source components. The presented method is pitch-synchronous and thereby exhibits a high robustness against vocal jitter, shimmer and other glottal variations while cover...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on audio, speech, and language processing speech, and language processing, 2013-08, Vol.21 (8), p.1560-1572
Hauptverfasser:	Schleusing, O., Kinnunen, T., Story, B., Vesin, J-M
Format:	Artikel
Sprache:	eng
Schlagworte:	Applied sciences Detection, estimation, filtering, equalization, prediction differential evolution Estimation Exact sciences and technology Global optimization glottal inverse filtering Information, signal and communications theory joint source-filter optimization Joints Mathematical model Optimization Production Signal and communications theory Signal processing Signal, noise Speech Speech processing Telecommunications and information theory time-varying vocal tract estimation Vectors
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1572
container_issue	8
container_start_page	1560
container_title	IEEE transactions on audio, speech, and language processing
container_volume	21
creator	Schleusing, O. Kinnunen, T. Story, B. Vesin, J-M
description	In this work, we present a joint source-filter optimization approach for separating voiced speech into vocal tract (VT) and voice source components. The presented method is pitch-synchronous and thereby exhibits a high robustness against vocal jitter, shimmer and other glottal variations while covering various voice qualities. The voice source is modeled using the Liljencrants-Fant (LF) model, which is integrated into a time-varying auto-regressive speech production model with exogenous input (ARX). The non-convex optimization problem of finding the optimal model parameters is addressed by a heuristic, evolutionary optimization method called differential evolution. The optimization method is first validated in a series of experiments with synthetic speech. Estimated glottal source and VT parameters are the criteria used for comparison with the iterative adaptive inverse filter (IAIF) method and the linear prediction (LP) method under varying conditions such as jitter, fundamental frequency ( f 0 ) as well as environmental and glottal noise. The results show that the proposed method largely reduces the bias and standard deviation of estimated VT coefficients and glottal source parameters. Furthermore, the performance of the source-filter separation is evaluated in experiments using speech generated with a physical model of speech production. The proposed method reliably estimates glottal flow waveforms and lower formant frequencies. Results obtained for higher formant frequencies indicate that research on more accurate voice source models and their interaction with the VT is necessary to improve the source-filter separation. The proposed optimization approach promises to be a useful tool for future research addressing this topic.
doi_str_mv	10.1109/TASL.2013.2255275
format	Article
fullrecord	<record><control><sourceid>pascalfrancis_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TASL_2013_2255275</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6488745</ieee_id><sourcerecordid>27572193</sourcerecordid><originalsourceid>FETCH-LOGICAL-c404t-2cc3cbdc36e66567afb081af906d12eb20bcc0dd18f9ff0e305c933e11d0b55b3</originalsourceid><addsrcrecordid>eNo9kMtOwzAQRS0EEqXwAYiNNyxTPHbsJMuqtDxUqYu2bCNnYiOjkFS2iwRfT6JUXc1I99x5XELugc0AWPG0m2_XM85AzDiXkmfygkxAyjzJCp5enntQ1-QmhC_GUqFSmBD93rk20m139GiSlWui8XRziO7b_enoupbaztM54tHraOhHh7qhO68x0mXoqZHZB9d-0mdnrfGmja5nlj9dcxzEW3JldRPM3alOyX613C1ek_Xm5W0xXyeYsjQmHFFgVaNQRimpMm0rloO2BVM1cFNxViGyuobcFtYyI5jEQggDULNKykpMCYxz0XcheGPLg-_v878lsHLIqBwyKoeMylNGvedx9Bx06B-zXrfowtnYIxmHfsuUPIycM8acZZXmeZZK8Q9iunH3</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Joint Source-Filter Optimization for Accurate Vocal Tract Estimation Using Differential Evolution</title><source>IEEE Xplore</source><creator>Schleusing, O. ; Kinnunen, T. ; Story, B. ; Vesin, J-M</creator><creatorcontrib>Schleusing, O. ; Kinnunen, T. ; Story, B. ; Vesin, J-M</creatorcontrib><description>In this work, we present a joint source-filter optimization approach for separating voiced speech into vocal tract (VT) and voice source components. The presented method is pitch-synchronous and thereby exhibits a high robustness against vocal jitter, shimmer and other glottal variations while covering various voice qualities. The voice source is modeled using the Liljencrants-Fant (LF) model, which is integrated into a time-varying auto-regressive speech production model with exogenous input (ARX). The non-convex optimization problem of finding the optimal model parameters is addressed by a heuristic, evolutionary optimization method called differential evolution. The optimization method is first validated in a series of experiments with synthetic speech. Estimated glottal source and VT parameters are the criteria used for comparison with the iterative adaptive inverse filter (IAIF) method and the linear prediction (LP) method under varying conditions such as jitter, fundamental frequency ( f 0 ) as well as environmental and glottal noise. The results show that the proposed method largely reduces the bias and standard deviation of estimated VT coefficients and glottal source parameters. Furthermore, the performance of the source-filter separation is evaluated in experiments using speech generated with a physical model of speech production. The proposed method reliably estimates glottal flow waveforms and lower formant frequencies. Results obtained for higher formant frequencies indicate that research on more accurate voice source models and their interaction with the VT is necessary to improve the source-filter separation. The proposed optimization approach promises to be a useful tool for future research addressing this topic.</description><identifier>ISSN: 1558-7916</identifier><identifier>EISSN: 1558-7924</identifier><identifier>DOI: 10.1109/TASL.2013.2255275</identifier><identifier>CODEN: ITASD8</identifier><language>eng</language><publisher>Piscataway, NJ: IEEE</publisher><subject>Applied sciences ; Detection, estimation, filtering, equalization, prediction ; differential evolution ; Estimation ; Exact sciences and technology ; Global optimization ; glottal inverse filtering ; Information, signal and communications theory ; joint source-filter optimization ; Joints ; Mathematical model ; Optimization ; Production ; Signal and communications theory ; Signal processing ; Signal, noise ; Speech ; Speech processing ; Telecommunications and information theory ; time-varying vocal tract estimation ; Vectors</subject><ispartof>IEEE transactions on audio, speech, and language processing, 2013-08, Vol.21 (8), p.1560-1572</ispartof><rights>2014 INIST-CNRS</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c404t-2cc3cbdc36e66567afb081af906d12eb20bcc0dd18f9ff0e305c933e11d0b55b3</citedby><cites>FETCH-LOGICAL-c404t-2cc3cbdc36e66567afb081af906d12eb20bcc0dd18f9ff0e305c933e11d0b55b3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6488745$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27923,27924,54757</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6488745$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=27572193$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Schleusing, O.</creatorcontrib><creatorcontrib>Kinnunen, T.</creatorcontrib><creatorcontrib>Story, B.</creatorcontrib><creatorcontrib>Vesin, J-M</creatorcontrib><title>Joint Source-Filter Optimization for Accurate Vocal Tract Estimation Using Differential Evolution</title><title>IEEE transactions on audio, speech, and language processing</title><addtitle>TASL</addtitle><description>In this work, we present a joint source-filter optimization approach for separating voiced speech into vocal tract (VT) and voice source components. The presented method is pitch-synchronous and thereby exhibits a high robustness against vocal jitter, shimmer and other glottal variations while covering various voice qualities. The voice source is modeled using the Liljencrants-Fant (LF) model, which is integrated into a time-varying auto-regressive speech production model with exogenous input (ARX). The non-convex optimization problem of finding the optimal model parameters is addressed by a heuristic, evolutionary optimization method called differential evolution. The optimization method is first validated in a series of experiments with synthetic speech. Estimated glottal source and VT parameters are the criteria used for comparison with the iterative adaptive inverse filter (IAIF) method and the linear prediction (LP) method under varying conditions such as jitter, fundamental frequency ( f 0 ) as well as environmental and glottal noise. The results show that the proposed method largely reduces the bias and standard deviation of estimated VT coefficients and glottal source parameters. Furthermore, the performance of the source-filter separation is evaluated in experiments using speech generated with a physical model of speech production. The proposed method reliably estimates glottal flow waveforms and lower formant frequencies. Results obtained for higher formant frequencies indicate that research on more accurate voice source models and their interaction with the VT is necessary to improve the source-filter separation. The proposed optimization approach promises to be a useful tool for future research addressing this topic.</description><subject>Applied sciences</subject><subject>Detection, estimation, filtering, equalization, prediction</subject><subject>differential evolution</subject><subject>Estimation</subject><subject>Exact sciences and technology</subject><subject>Global optimization</subject><subject>glottal inverse filtering</subject><subject>Information, signal and communications theory</subject><subject>joint source-filter optimization</subject><subject>Joints</subject><subject>Mathematical model</subject><subject>Optimization</subject><subject>Production</subject><subject>Signal and communications theory</subject><subject>Signal processing</subject><subject>Signal, noise</subject><subject>Speech</subject><subject>Speech processing</subject><subject>Telecommunications and information theory</subject><subject>time-varying vocal tract estimation</subject><subject>Vectors</subject><issn>1558-7916</issn><issn>1558-7924</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2013</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kMtOwzAQRS0EEqXwAYiNNyxTPHbsJMuqtDxUqYu2bCNnYiOjkFS2iwRfT6JUXc1I99x5XELugc0AWPG0m2_XM85AzDiXkmfygkxAyjzJCp5enntQ1-QmhC_GUqFSmBD93rk20m139GiSlWui8XRziO7b_enoupbaztM54tHraOhHh7qhO68x0mXoqZHZB9d-0mdnrfGmja5nlj9dcxzEW3JldRPM3alOyX613C1ek_Xm5W0xXyeYsjQmHFFgVaNQRimpMm0rloO2BVM1cFNxViGyuobcFtYyI5jEQggDULNKykpMCYxz0XcheGPLg-_v878lsHLIqBwyKoeMylNGvedx9Bx06B-zXrfowtnYIxmHfsuUPIycM8acZZXmeZZK8Q9iunH3</recordid><startdate>20130801</startdate><enddate>20130801</enddate><creator>Schleusing, O.</creator><creator>Kinnunen, T.</creator><creator>Story, B.</creator><creator>Vesin, J-M</creator><general>IEEE</general><general>Institute of Electrical and Electronics Engineers</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20130801</creationdate><title>Joint Source-Filter Optimization for Accurate Vocal Tract Estimation Using Differential Evolution</title><author>Schleusing, O. ; Kinnunen, T. ; Story, B. ; Vesin, J-M</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c404t-2cc3cbdc36e66567afb081af906d12eb20bcc0dd18f9ff0e305c933e11d0b55b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Applied sciences</topic><topic>Detection, estimation, filtering, equalization, prediction</topic><topic>differential evolution</topic><topic>Estimation</topic><topic>Exact sciences and technology</topic><topic>Global optimization</topic><topic>glottal inverse filtering</topic><topic>Information, signal and communications theory</topic><topic>joint source-filter optimization</topic><topic>Joints</topic><topic>Mathematical model</topic><topic>Optimization</topic><topic>Production</topic><topic>Signal and communications theory</topic><topic>Signal processing</topic><topic>Signal, noise</topic><topic>Speech</topic><topic>Speech processing</topic><topic>Telecommunications and information theory</topic><topic>time-varying vocal tract estimation</topic><topic>Vectors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Schleusing, O.</creatorcontrib><creatorcontrib>Kinnunen, T.</creatorcontrib><creatorcontrib>Story, B.</creatorcontrib><creatorcontrib>Vesin, J-M</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><jtitle>IEEE transactions on audio, speech, and language processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Schleusing, O.</au><au>Kinnunen, T.</au><au>Story, B.</au><au>Vesin, J-M</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Joint Source-Filter Optimization for Accurate Vocal Tract Estimation Using Differential Evolution</atitle><jtitle>IEEE transactions on audio, speech, and language processing</jtitle><stitle>TASL</stitle><date>2013-08-01</date><risdate>2013</risdate><volume>21</volume><issue>8</issue><spage>1560</spage><epage>1572</epage><pages>1560-1572</pages><issn>1558-7916</issn><eissn>1558-7924</eissn><coden>ITASD8</coden><abstract>In this work, we present a joint source-filter optimization approach for separating voiced speech into vocal tract (VT) and voice source components. The presented method is pitch-synchronous and thereby exhibits a high robustness against vocal jitter, shimmer and other glottal variations while covering various voice qualities. The voice source is modeled using the Liljencrants-Fant (LF) model, which is integrated into a time-varying auto-regressive speech production model with exogenous input (ARX). The non-convex optimization problem of finding the optimal model parameters is addressed by a heuristic, evolutionary optimization method called differential evolution. The optimization method is first validated in a series of experiments with synthetic speech. Estimated glottal source and VT parameters are the criteria used for comparison with the iterative adaptive inverse filter (IAIF) method and the linear prediction (LP) method under varying conditions such as jitter, fundamental frequency ( f 0 ) as well as environmental and glottal noise. The results show that the proposed method largely reduces the bias and standard deviation of estimated VT coefficients and glottal source parameters. Furthermore, the performance of the source-filter separation is evaluated in experiments using speech generated with a physical model of speech production. The proposed method reliably estimates glottal flow waveforms and lower formant frequencies. Results obtained for higher formant frequencies indicate that research on more accurate voice source models and their interaction with the VT is necessary to improve the source-filter separation. The proposed optimization approach promises to be a useful tool for future research addressing this topic.</abstract><cop>Piscataway, NJ</cop><pub>IEEE</pub><doi>10.1109/TASL.2013.2255275</doi><tpages>13</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1558-7916
ispartof	IEEE transactions on audio, speech, and language processing, 2013-08, Vol.21 (8), p.1560-1572
issn	1558-7916 1558-7924
language	eng
recordid	cdi_crossref_primary_10_1109_TASL_2013_2255275
source	IEEE Xplore
subjects	Applied sciences Detection, estimation, filtering, equalization, prediction differential evolution Estimation Exact sciences and technology Global optimization glottal inverse filtering Information, signal and communications theory joint source-filter optimization Joints Mathematical model Optimization Production Signal and communications theory Signal processing Signal, noise Speech Speech processing Telecommunications and information theory time-varying vocal tract estimation Vectors
title	Joint Source-Filter Optimization for Accurate Vocal Tract Estimation Using Differential Evolution
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-10T14%3A03%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-pascalfrancis_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Joint%20Source-Filter%20Optimization%20for%20Accurate%20Vocal%20Tract%20Estimation%20Using%20Differential%20Evolution&rft.jtitle=IEEE%20transactions%20on%20audio,%20speech,%20and%20language%20processing&rft.au=Schleusing,%20O.&rft.date=2013-08-01&rft.volume=21&rft.issue=8&rft.spage=1560&rft.epage=1572&rft.pages=1560-1572&rft.issn=1558-7916&rft.eissn=1558-7924&rft.coden=ITASD8&rft_id=info:doi/10.1109/TASL.2013.2255275&rft_dat=%3Cpascalfrancis_RIE%3E27572193%3C/pascalfrancis_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6488745&rfr_iscdi=true