GEDI: Gammachirp envelope distortion index for predicting intelligibility of enhanced speech

•A new objective measure for speech intelligibility is proposed.•The proposed model is based on a signal-to-distortion ratio in the auditory envelope.•Evaluation is performed with speech signals enhanced by nonlinear processing.•The proposed model can predict human results more accurate than convent...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Speech communication 2020-10, Vol.123, p.43-58
Hauptverfasser: Yamamoto, Katsuhiko, Irino, Toshio, Araki, Shoko, Kinoshita, Keisuke, Nakatani, Tomohiro
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 58
container_issue
container_start_page 43
container_title Speech communication
container_volume 123
creator Yamamoto, Katsuhiko
Irino, Toshio
Araki, Shoko
Kinoshita, Keisuke
Nakatani, Tomohiro
description •A new objective measure for speech intelligibility is proposed.•The proposed model is based on a signal-to-distortion ratio in the auditory envelope.•Evaluation is performed with speech signals enhanced by nonlinear processing.•The proposed model can predict human results more accurate than conventional models. In this study, we propose a new concept, the gammachirp envelope distortion index (GEDI), based on the signal-to-distortion ratio in the auditory envelope, SDRenv, to predict the intelligibility of speech enhanced by nonlinear algorithms. The objective of GEDI is to calculate the distortion between enhanced and clean-speech representations in the domain of a temporal envelope extracted by the gammachirp auditory filterbank and modulation filterbank. We also extend GEDI with multi-resolution analysis (mr-GEDI) to predict the speech intelligibility of sounds under non-stationary noise conditions. We evaluate GEDI in terms of the speech intelligibility predictions of speech sounds enhanced by a classic spectral subtraction and a Wiener filtering method. The predictions are compared with human results for various signal-to-noise ratio conditions with additive pink and babble noises. The results showed that mr-GEDI predicted the intelligibility curves better than short-time objective intelligibility (STOI) measure, extended-STOI (ESTOI) measure, and hearing-aid speech perception index (HASPI) under pink-noise conditions, and better than HASPI under babble-noise conditions. The mr-GEDI method does not present an overestimation tendency and is considered a more conservative approach than STOI and ESTOI. Therefore, the evaluation with mr-GEDI may provide additional information in the development of speech enhancement algorithms.
doi_str_mv 10.1016/j.specom.2020.06.001
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2449988895</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0167639320302363</els_id><sourcerecordid>2449988895</sourcerecordid><originalsourceid>FETCH-LOGICAL-c334t-bae8c29e4eb1bc6dd33332219aedf4311584867c452f67a2ae21ee34af4574603</originalsourceid><addsrcrecordid>eNp9kE9Lw0AQxRdRsFa_gYeA58T9l83GgyC11kLBi96EZbOZtBvSbNxNxX57t8SzcxkY3puZ90PoluCMYCLu2ywMYNw-o5jiDIsMY3KGZkQWNC2IpOdoFmVFKljJLtFVCC3GmEtJZ-hztXxePyQrvd9rs7N-SKD_hs4NkNQ2jM6P1vWJ7Wv4SRrnk8FDbc1o-20cjtB1dmsr29nxmLgmene6N1An8R8wu2t00eguwM1fn6OPl-X74jXdvK3Wi6dNahjjY1ppkIaWwKEilRF1zWJRSkoNdcMZIbnkUhSG57QRhaYaKAFgXDc8L7jAbI7upr2Dd18HCKNq3cH38aSinJellLLMo4pPKuNdCB4aNXi71_6oCFYnjqpVE0d14qiwUJFjtD1ONogJvi14FYyFU0rrwYyqdvb_Bb8SnH5b</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2449988895</pqid></control><display><type>article</type><title>GEDI: Gammachirp envelope distortion index for predicting intelligibility of enhanced speech</title><source>ScienceDirect Journals (5 years ago - present)</source><creator>Yamamoto, Katsuhiko ; Irino, Toshio ; Araki, Shoko ; Kinoshita, Keisuke ; Nakatani, Tomohiro</creator><creatorcontrib>Yamamoto, Katsuhiko ; Irino, Toshio ; Araki, Shoko ; Kinoshita, Keisuke ; Nakatani, Tomohiro</creatorcontrib><description>•A new objective measure for speech intelligibility is proposed.•The proposed model is based on a signal-to-distortion ratio in the auditory envelope.•Evaluation is performed with speech signals enhanced by nonlinear processing.•The proposed model can predict human results more accurate than conventional models. In this study, we propose a new concept, the gammachirp envelope distortion index (GEDI), based on the signal-to-distortion ratio in the auditory envelope, SDRenv, to predict the intelligibility of speech enhanced by nonlinear algorithms. The objective of GEDI is to calculate the distortion between enhanced and clean-speech representations in the domain of a temporal envelope extracted by the gammachirp auditory filterbank and modulation filterbank. We also extend GEDI with multi-resolution analysis (mr-GEDI) to predict the speech intelligibility of sounds under non-stationary noise conditions. We evaluate GEDI in terms of the speech intelligibility predictions of speech sounds enhanced by a classic spectral subtraction and a Wiener filtering method. The predictions are compared with human results for various signal-to-noise ratio conditions with additive pink and babble noises. The results showed that mr-GEDI predicted the intelligibility curves better than short-time objective intelligibility (STOI) measure, extended-STOI (ESTOI) measure, and hearing-aid speech perception index (HASPI) under pink-noise conditions, and better than HASPI under babble-noise conditions. The mr-GEDI method does not present an overestimation tendency and is considered a more conservative approach than STOI and ESTOI. Therefore, the evaluation with mr-GEDI may provide additional information in the development of speech enhancement algorithms.</description><identifier>ISSN: 0167-6393</identifier><identifier>EISSN: 1872-7182</identifier><identifier>DOI: 10.1016/j.specom.2020.06.001</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Acoustics ; Algorithms ; Distortion ; Evaluation ; Hearing aids ; Intelligibility ; Noise ; Objective measure ; Signal to noise ratio ; Speech ; Speech enhancement ; Speech intelligibility ; Speech perception ; Speech processing ; Speech sounds ; Subtraction ; Wiener filtering</subject><ispartof>Speech communication, 2020-10, Vol.123, p.43-58</ispartof><rights>2020</rights><rights>Copyright Elsevier Science Ltd. Oct 2020</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c334t-bae8c29e4eb1bc6dd33332219aedf4311584867c452f67a2ae21ee34af4574603</citedby><cites>FETCH-LOGICAL-c334t-bae8c29e4eb1bc6dd33332219aedf4311584867c452f67a2ae21ee34af4574603</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.specom.2020.06.001$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids></links><search><creatorcontrib>Yamamoto, Katsuhiko</creatorcontrib><creatorcontrib>Irino, Toshio</creatorcontrib><creatorcontrib>Araki, Shoko</creatorcontrib><creatorcontrib>Kinoshita, Keisuke</creatorcontrib><creatorcontrib>Nakatani, Tomohiro</creatorcontrib><title>GEDI: Gammachirp envelope distortion index for predicting intelligibility of enhanced speech</title><title>Speech communication</title><description>•A new objective measure for speech intelligibility is proposed.•The proposed model is based on a signal-to-distortion ratio in the auditory envelope.•Evaluation is performed with speech signals enhanced by nonlinear processing.•The proposed model can predict human results more accurate than conventional models. In this study, we propose a new concept, the gammachirp envelope distortion index (GEDI), based on the signal-to-distortion ratio in the auditory envelope, SDRenv, to predict the intelligibility of speech enhanced by nonlinear algorithms. The objective of GEDI is to calculate the distortion between enhanced and clean-speech representations in the domain of a temporal envelope extracted by the gammachirp auditory filterbank and modulation filterbank. We also extend GEDI with multi-resolution analysis (mr-GEDI) to predict the speech intelligibility of sounds under non-stationary noise conditions. We evaluate GEDI in terms of the speech intelligibility predictions of speech sounds enhanced by a classic spectral subtraction and a Wiener filtering method. The predictions are compared with human results for various signal-to-noise ratio conditions with additive pink and babble noises. The results showed that mr-GEDI predicted the intelligibility curves better than short-time objective intelligibility (STOI) measure, extended-STOI (ESTOI) measure, and hearing-aid speech perception index (HASPI) under pink-noise conditions, and better than HASPI under babble-noise conditions. The mr-GEDI method does not present an overestimation tendency and is considered a more conservative approach than STOI and ESTOI. Therefore, the evaluation with mr-GEDI may provide additional information in the development of speech enhancement algorithms.</description><subject>Acoustics</subject><subject>Algorithms</subject><subject>Distortion</subject><subject>Evaluation</subject><subject>Hearing aids</subject><subject>Intelligibility</subject><subject>Noise</subject><subject>Objective measure</subject><subject>Signal to noise ratio</subject><subject>Speech</subject><subject>Speech enhancement</subject><subject>Speech intelligibility</subject><subject>Speech perception</subject><subject>Speech processing</subject><subject>Speech sounds</subject><subject>Subtraction</subject><subject>Wiener filtering</subject><issn>0167-6393</issn><issn>1872-7182</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><recordid>eNp9kE9Lw0AQxRdRsFa_gYeA58T9l83GgyC11kLBi96EZbOZtBvSbNxNxX57t8SzcxkY3puZ90PoluCMYCLu2ywMYNw-o5jiDIsMY3KGZkQWNC2IpOdoFmVFKljJLtFVCC3GmEtJZ-hztXxePyQrvd9rs7N-SKD_hs4NkNQ2jM6P1vWJ7Wv4SRrnk8FDbc1o-20cjtB1dmsr29nxmLgmene6N1An8R8wu2t00eguwM1fn6OPl-X74jXdvK3Wi6dNahjjY1ppkIaWwKEilRF1zWJRSkoNdcMZIbnkUhSG57QRhaYaKAFgXDc8L7jAbI7upr2Dd18HCKNq3cH38aSinJellLLMo4pPKuNdCB4aNXi71_6oCFYnjqpVE0d14qiwUJFjtD1ONogJvi14FYyFU0rrwYyqdvb_Bb8SnH5b</recordid><startdate>202010</startdate><enddate>202010</enddate><creator>Yamamoto, Katsuhiko</creator><creator>Irino, Toshio</creator><creator>Araki, Shoko</creator><creator>Kinoshita, Keisuke</creator><creator>Nakatani, Tomohiro</creator><general>Elsevier B.V</general><general>Elsevier Science Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7T9</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>202010</creationdate><title>GEDI: Gammachirp envelope distortion index for predicting intelligibility of enhanced speech</title><author>Yamamoto, Katsuhiko ; Irino, Toshio ; Araki, Shoko ; Kinoshita, Keisuke ; Nakatani, Tomohiro</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c334t-bae8c29e4eb1bc6dd33332219aedf4311584867c452f67a2ae21ee34af4574603</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Acoustics</topic><topic>Algorithms</topic><topic>Distortion</topic><topic>Evaluation</topic><topic>Hearing aids</topic><topic>Intelligibility</topic><topic>Noise</topic><topic>Objective measure</topic><topic>Signal to noise ratio</topic><topic>Speech</topic><topic>Speech enhancement</topic><topic>Speech intelligibility</topic><topic>Speech perception</topic><topic>Speech processing</topic><topic>Speech sounds</topic><topic>Subtraction</topic><topic>Wiener filtering</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yamamoto, Katsuhiko</creatorcontrib><creatorcontrib>Irino, Toshio</creatorcontrib><creatorcontrib>Araki, Shoko</creatorcontrib><creatorcontrib>Kinoshita, Keisuke</creatorcontrib><creatorcontrib>Nakatani, Tomohiro</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Speech communication</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yamamoto, Katsuhiko</au><au>Irino, Toshio</au><au>Araki, Shoko</au><au>Kinoshita, Keisuke</au><au>Nakatani, Tomohiro</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>GEDI: Gammachirp envelope distortion index for predicting intelligibility of enhanced speech</atitle><jtitle>Speech communication</jtitle><date>2020-10</date><risdate>2020</risdate><volume>123</volume><spage>43</spage><epage>58</epage><pages>43-58</pages><issn>0167-6393</issn><eissn>1872-7182</eissn><abstract>•A new objective measure for speech intelligibility is proposed.•The proposed model is based on a signal-to-distortion ratio in the auditory envelope.•Evaluation is performed with speech signals enhanced by nonlinear processing.•The proposed model can predict human results more accurate than conventional models. In this study, we propose a new concept, the gammachirp envelope distortion index (GEDI), based on the signal-to-distortion ratio in the auditory envelope, SDRenv, to predict the intelligibility of speech enhanced by nonlinear algorithms. The objective of GEDI is to calculate the distortion between enhanced and clean-speech representations in the domain of a temporal envelope extracted by the gammachirp auditory filterbank and modulation filterbank. We also extend GEDI with multi-resolution analysis (mr-GEDI) to predict the speech intelligibility of sounds under non-stationary noise conditions. We evaluate GEDI in terms of the speech intelligibility predictions of speech sounds enhanced by a classic spectral subtraction and a Wiener filtering method. The predictions are compared with human results for various signal-to-noise ratio conditions with additive pink and babble noises. The results showed that mr-GEDI predicted the intelligibility curves better than short-time objective intelligibility (STOI) measure, extended-STOI (ESTOI) measure, and hearing-aid speech perception index (HASPI) under pink-noise conditions, and better than HASPI under babble-noise conditions. The mr-GEDI method does not present an overestimation tendency and is considered a more conservative approach than STOI and ESTOI. Therefore, the evaluation with mr-GEDI may provide additional information in the development of speech enhancement algorithms.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.specom.2020.06.001</doi><tpages>16</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0167-6393
ispartof Speech communication, 2020-10, Vol.123, p.43-58
issn 0167-6393
1872-7182
language eng
recordid cdi_proquest_journals_2449988895
source ScienceDirect Journals (5 years ago - present)
subjects Acoustics
Algorithms
Distortion
Evaluation
Hearing aids
Intelligibility
Noise
Objective measure
Signal to noise ratio
Speech
Speech enhancement
Speech intelligibility
Speech perception
Speech processing
Speech sounds
Subtraction
Wiener filtering
title GEDI: Gammachirp envelope distortion index for predicting intelligibility of enhanced speech
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T11%3A20%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=GEDI:%20Gammachirp%20envelope%20distortion%20index%20for%20predicting%20intelligibility%20of%20enhanced%20speech&rft.jtitle=Speech%20communication&rft.au=Yamamoto,%20Katsuhiko&rft.date=2020-10&rft.volume=123&rft.spage=43&rft.epage=58&rft.pages=43-58&rft.issn=0167-6393&rft.eissn=1872-7182&rft_id=info:doi/10.1016/j.specom.2020.06.001&rft_dat=%3Cproquest_cross%3E2449988895%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2449988895&rft_id=info:pmid/&rft_els_id=S0167639320302363&rfr_iscdi=true