Statistical approach to enhancing esophageal speech based on Gaussian mixture models

This paper presents a novel method of enhancing esophageal speech using statistical voice conversion. Esophageal speech is one of the alternative speaking methods for laryngectomees. Although it doesn't require any external devices, generated voices sound unnatural. To improve the intelligibili...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Doi, Hironori, Nakamura, Keigo, Toda, Tomoki, Saruwatari, Hiroshi, Shikano, Kiyohiro
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 4253
container_issue
container_start_page 4250
container_title
container_volume
creator Doi, Hironori
Nakamura, Keigo
Toda, Tomoki
Saruwatari, Hiroshi
Shikano, Kiyohiro
description This paper presents a novel method of enhancing esophageal speech using statistical voice conversion. Esophageal speech is one of the alternative speaking methods for laryngectomees. Although it doesn't require any external devices, generated voices sound unnatural. To improve the intelligibility and naturalness of esophageal speech, we propose a voice conversion method from esophageal speech into normal speech. A spectral parameter and excitation parameters of target normal speech are separately estimated from a spectral parameter of the esophageal speech based on Gaussian mixture models. The experimental results demonstrate that the proposed method yields significant improvements in intelligibility and naturalness. We also apply one-to-many eigenvoice conversion to esophageal speech enhancement for flexibly controlling enhanced voice quality.
doi_str_mv 10.1109/ICASSP.2010.5495676
format Conference Proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_5495676</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5495676</ieee_id><sourcerecordid>5495676</sourcerecordid><originalsourceid>FETCH-LOGICAL-c298t-12d4ffa8089693689ab05b53935f9f1aecbd51f233ff0679995c27529646d1853</originalsourceid><addsrcrecordid>eNpVkN1Kw0AUhNc_sNY-QW_2BVL3P3supWgVCgqp4F05Sc62K20Ssino2xuwN14NzAwD3zA2l2IhpYCH1-VjUbwvlBgNa8C63F2wGeReGmWMUeDcJZsonUMmQXxe_cssXLOJtEpkThq4ZXcpfQkhfG78hG2KAYeYhljhgWPX9S1Wez60nJo9NlVsdpxS2-1xR2MhdURjXGKimrcNX-EppYgNP8bv4dQTP7Y1HdI9uwl4SDQ765R9PD9tli_Z-m01oqyzSoEfMqlqEwJ64cGBdh6wFLa0GrQNECRSVdZWBqV1CMLlAGArlduR1rhaequnbP63G4lo2_XxiP3P9nyQ_gWrdlYS</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Statistical approach to enhancing esophageal speech based on Gaussian mixture models</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Doi, Hironori ; Nakamura, Keigo ; Toda, Tomoki ; Saruwatari, Hiroshi ; Shikano, Kiyohiro</creator><creatorcontrib>Doi, Hironori ; Nakamura, Keigo ; Toda, Tomoki ; Saruwatari, Hiroshi ; Shikano, Kiyohiro</creatorcontrib><description>This paper presents a novel method of enhancing esophageal speech using statistical voice conversion. Esophageal speech is one of the alternative speaking methods for laryngectomees. Although it doesn't require any external devices, generated voices sound unnatural. To improve the intelligibility and naturalness of esophageal speech, we propose a voice conversion method from esophageal speech into normal speech. A spectral parameter and excitation parameters of target normal speech are separately estimated from a spectral parameter of the esophageal speech based on Gaussian mixture models. The experimental results demonstrate that the proposed method yields significant improvements in intelligibility and naturalness. We also apply one-to-many eigenvoice conversion to esophageal speech enhancement for flexibly controlling enhanced voice quality.</description><identifier>ISSN: 1520-6149</identifier><identifier>ISBN: 9781424442959</identifier><identifier>ISBN: 1424442958</identifier><identifier>EISSN: 2379-190X</identifier><identifier>EISBN: 9781424442966</identifier><identifier>EISBN: 1424442966</identifier><identifier>DOI: 10.1109/ICASSP.2010.5495676</identifier><language>eng</language><publisher>IEEE</publisher><subject>Acoustic noise ; Degradation ; eigenvoice conversion ; esophageal speech ; Esophagus ; Information science ; laryngectomees ; Loudspeakers ; Noise generators ; Speech analysis ; Speech enhancement ; Speech processing ; Virtual colonoscopy ; voice conversion</subject><ispartof>2010 IEEE International Conference on Acoustics, Speech and Signal Processing, 2010, p.4250-4253</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c298t-12d4ffa8089693689ab05b53935f9f1aecbd51f233ff0679995c27529646d1853</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5495676$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2056,27923,54918</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5495676$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Doi, Hironori</creatorcontrib><creatorcontrib>Nakamura, Keigo</creatorcontrib><creatorcontrib>Toda, Tomoki</creatorcontrib><creatorcontrib>Saruwatari, Hiroshi</creatorcontrib><creatorcontrib>Shikano, Kiyohiro</creatorcontrib><title>Statistical approach to enhancing esophageal speech based on Gaussian mixture models</title><title>2010 IEEE International Conference on Acoustics, Speech and Signal Processing</title><addtitle>ICASSP</addtitle><description>This paper presents a novel method of enhancing esophageal speech using statistical voice conversion. Esophageal speech is one of the alternative speaking methods for laryngectomees. Although it doesn't require any external devices, generated voices sound unnatural. To improve the intelligibility and naturalness of esophageal speech, we propose a voice conversion method from esophageal speech into normal speech. A spectral parameter and excitation parameters of target normal speech are separately estimated from a spectral parameter of the esophageal speech based on Gaussian mixture models. The experimental results demonstrate that the proposed method yields significant improvements in intelligibility and naturalness. We also apply one-to-many eigenvoice conversion to esophageal speech enhancement for flexibly controlling enhanced voice quality.</description><subject>Acoustic noise</subject><subject>Degradation</subject><subject>eigenvoice conversion</subject><subject>esophageal speech</subject><subject>Esophagus</subject><subject>Information science</subject><subject>laryngectomees</subject><subject>Loudspeakers</subject><subject>Noise generators</subject><subject>Speech analysis</subject><subject>Speech enhancement</subject><subject>Speech processing</subject><subject>Virtual colonoscopy</subject><subject>voice conversion</subject><issn>1520-6149</issn><issn>2379-190X</issn><isbn>9781424442959</isbn><isbn>1424442958</isbn><isbn>9781424442966</isbn><isbn>1424442966</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2010</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNpVkN1Kw0AUhNc_sNY-QW_2BVL3P3supWgVCgqp4F05Sc62K20Ssino2xuwN14NzAwD3zA2l2IhpYCH1-VjUbwvlBgNa8C63F2wGeReGmWMUeDcJZsonUMmQXxe_cssXLOJtEpkThq4ZXcpfQkhfG78hG2KAYeYhljhgWPX9S1Wez60nJo9NlVsdpxS2-1xR2MhdURjXGKimrcNX-EppYgNP8bv4dQTP7Y1HdI9uwl4SDQ765R9PD9tli_Z-m01oqyzSoEfMqlqEwJ64cGBdh6wFLa0GrQNECRSVdZWBqV1CMLlAGArlduR1rhaequnbP63G4lo2_XxiP3P9nyQ_gWrdlYS</recordid><startdate>201003</startdate><enddate>201003</enddate><creator>Doi, Hironori</creator><creator>Nakamura, Keigo</creator><creator>Toda, Tomoki</creator><creator>Saruwatari, Hiroshi</creator><creator>Shikano, Kiyohiro</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>201003</creationdate><title>Statistical approach to enhancing esophageal speech based on Gaussian mixture models</title><author>Doi, Hironori ; Nakamura, Keigo ; Toda, Tomoki ; Saruwatari, Hiroshi ; Shikano, Kiyohiro</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c298t-12d4ffa8089693689ab05b53935f9f1aecbd51f233ff0679995c27529646d1853</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Acoustic noise</topic><topic>Degradation</topic><topic>eigenvoice conversion</topic><topic>esophageal speech</topic><topic>Esophagus</topic><topic>Information science</topic><topic>laryngectomees</topic><topic>Loudspeakers</topic><topic>Noise generators</topic><topic>Speech analysis</topic><topic>Speech enhancement</topic><topic>Speech processing</topic><topic>Virtual colonoscopy</topic><topic>voice conversion</topic><toplevel>online_resources</toplevel><creatorcontrib>Doi, Hironori</creatorcontrib><creatorcontrib>Nakamura, Keigo</creatorcontrib><creatorcontrib>Toda, Tomoki</creatorcontrib><creatorcontrib>Saruwatari, Hiroshi</creatorcontrib><creatorcontrib>Shikano, Kiyohiro</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Doi, Hironori</au><au>Nakamura, Keigo</au><au>Toda, Tomoki</au><au>Saruwatari, Hiroshi</au><au>Shikano, Kiyohiro</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Statistical approach to enhancing esophageal speech based on Gaussian mixture models</atitle><btitle>2010 IEEE International Conference on Acoustics, Speech and Signal Processing</btitle><stitle>ICASSP</stitle><date>2010-03</date><risdate>2010</risdate><spage>4250</spage><epage>4253</epage><pages>4250-4253</pages><issn>1520-6149</issn><eissn>2379-190X</eissn><isbn>9781424442959</isbn><isbn>1424442958</isbn><eisbn>9781424442966</eisbn><eisbn>1424442966</eisbn><abstract>This paper presents a novel method of enhancing esophageal speech using statistical voice conversion. Esophageal speech is one of the alternative speaking methods for laryngectomees. Although it doesn't require any external devices, generated voices sound unnatural. To improve the intelligibility and naturalness of esophageal speech, we propose a voice conversion method from esophageal speech into normal speech. A spectral parameter and excitation parameters of target normal speech are separately estimated from a spectral parameter of the esophageal speech based on Gaussian mixture models. The experimental results demonstrate that the proposed method yields significant improvements in intelligibility and naturalness. We also apply one-to-many eigenvoice conversion to esophageal speech enhancement for flexibly controlling enhanced voice quality.</abstract><pub>IEEE</pub><doi>10.1109/ICASSP.2010.5495676</doi><tpages>4</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1520-6149
ispartof 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, 2010, p.4250-4253
issn 1520-6149
2379-190X
language eng
recordid cdi_ieee_primary_5495676
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Acoustic noise
Degradation
eigenvoice conversion
esophageal speech
Esophagus
Information science
laryngectomees
Loudspeakers
Noise generators
Speech analysis
Speech enhancement
Speech processing
Virtual colonoscopy
voice conversion
title Statistical approach to enhancing esophageal speech based on Gaussian mixture models
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T13%3A16%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Statistical%20approach%20to%20enhancing%20esophageal%20speech%20based%20on%20Gaussian%20mixture%20models&rft.btitle=2010%20IEEE%20International%20Conference%20on%20Acoustics,%20Speech%20and%20Signal%20Processing&rft.au=Doi,%20Hironori&rft.date=2010-03&rft.spage=4250&rft.epage=4253&rft.pages=4250-4253&rft.issn=1520-6149&rft.eissn=2379-190X&rft.isbn=9781424442959&rft.isbn_list=1424442958&rft_id=info:doi/10.1109/ICASSP.2010.5495676&rft_dat=%3Cieee_6IE%3E5495676%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9781424442966&rft.eisbn_list=1424442966&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5495676&rfr_iscdi=true