Analysis of the Effects of Supraglottal Tract Surgical Procedures in Automatic Speaker Recognition Performance

This article evaluates the impact in the performance of state-of-the-art automatic speaker recognition schemes of three surgical procedures modifying the supraglottal tract structures of speakers. To do so, a new corpus (Cuco) was recorded, containing the speech of 107 speakers before and after surg...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2020, Vol.28, p.798-812
Hauptverfasser: Moro-Velaquez, Laureano, Hernandez-Garcia, Estefania, Gomez-Garcia, Jorge A., Godino-Llorente, Juan I., Dehak, Najim
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 812
container_issue
container_start_page 798
container_title IEEE/ACM transactions on audio, speech, and language processing
container_volume 28
creator Moro-Velaquez, Laureano
Hernandez-Garcia, Estefania
Gomez-Garcia, Jorge A.
Godino-Llorente, Juan I.
Dehak, Najim
description This article evaluates the impact in the performance of state-of-the-art automatic speaker recognition schemes of three surgical procedures modifying the supraglottal tract structures of speakers. To do so, a new corpus (Cuco) was recorded, containing the speech of 107 speakers before and after surgery. Speakers were divided into four groups depending on the type of surgery: tonsillectomy, functional endoscopy sinus surgery (FESS), septoplasty, and controls. The analyzed speaker recognition schemes were i-vectors, i-vectors with supervised Universal Background Model, i-vectors employing Time-delay Deep Neural Networks and x-vectors. In all cases, probabilistic linear discriminant analysis was employed in the back-end. Results show changes in the speech of patients who underwent tonsillectomy or FESS after surgery in contrast to controls or patients who had a septoplasty, where not significant variations are observed. These changes increase the Equal Error Rate (EER) of the analyzed speaker recognition schemes for the septoplasty and FESS groups when employing enrollment data recorded before the surgery. Moreover, surgery has a similar influence in the speech of female and male speakers with respect to the analyzed schemes. In consequence, results suggest that it is advisable to update the speaker's enrollment speech after three months following supraglottal tract surgery to ensure that the effects of the operation and post-operative recovery period do not influence the performance of the automatic speaker recognition systems.
doi_str_mv 10.1109/TASLP.2020.2967567
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_8962163</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8962163</ieee_id><sourcerecordid>2352193890</sourcerecordid><originalsourceid>FETCH-LOGICAL-c295t-95afc708c703c79ddc9751f76b8877a5c8b308020ecc1560267aa0d0ba5de1bd3</originalsourceid><addsrcrecordid>eNo9kMtOwzAQRS0EElXpD8DGEuuUsd3E8TKqykOqREXLOnKcSUlJ42A7i_496QMWo5k7mjvSPYTcM5gyBuppk62XqykHDlOuEhkn8oqMuOAqUgJm138zV3BLJt7vAICBVErORqTNWt0cfO2prWj4QrqoKjThJNd95_S2sSHohm6cNmFYuW1tBrly1mDZO_S0bmnWB7vXoTZ03aH-Rkc_0NhtW4fatnSFrrJur1uDd-Sm0o3HyaWPyefzYjN_jZbvL2_zbBkZruIQqVhXRkI6lDBSlaVRMmaVTIo0lVLHJi0EpENgNIbFCfBEag0lFDoukRWlGJPH89_O2Z8efch3tndDVJ9zEXOmRKpguOLnK-Os9w6rvHP1XrtDziA_os1PaPMj2vyCdjA9nE01Iv4bUpVwlgjxC8y-dmA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2352193890</pqid></control><display><type>article</type><title>Analysis of the Effects of Supraglottal Tract Surgical Procedures in Automatic Speaker Recognition Performance</title><source>IEEE Electronic Library (IEL)</source><creator>Moro-Velaquez, Laureano ; Hernandez-Garcia, Estefania ; Gomez-Garcia, Jorge A. ; Godino-Llorente, Juan I. ; Dehak, Najim</creator><creatorcontrib>Moro-Velaquez, Laureano ; Hernandez-Garcia, Estefania ; Gomez-Garcia, Jorge A. ; Godino-Llorente, Juan I. ; Dehak, Najim</creatorcontrib><description>This article evaluates the impact in the performance of state-of-the-art automatic speaker recognition schemes of three surgical procedures modifying the supraglottal tract structures of speakers. To do so, a new corpus (Cuco) was recorded, containing the speech of 107 speakers before and after surgery. Speakers were divided into four groups depending on the type of surgery: tonsillectomy, functional endoscopy sinus surgery (FESS), septoplasty, and controls. The analyzed speaker recognition schemes were i-vectors, i-vectors with supervised Universal Background Model, i-vectors employing Time-delay Deep Neural Networks and x-vectors. In all cases, probabilistic linear discriminant analysis was employed in the back-end. Results show changes in the speech of patients who underwent tonsillectomy or FESS after surgery in contrast to controls or patients who had a septoplasty, where not significant variations are observed. These changes increase the Equal Error Rate (EER) of the analyzed speaker recognition schemes for the septoplasty and FESS groups when employing enrollment data recorded before the surgery. Moreover, surgery has a similar influence in the speech of female and male speakers with respect to the analyzed schemes. In consequence, results suggest that it is advisable to update the speaker's enrollment speech after three months following supraglottal tract surgery to ensure that the effects of the operation and post-operative recovery period do not influence the performance of the automatic speaker recognition systems.</description><identifier>ISSN: 2329-9290</identifier><identifier>EISSN: 2329-9304</identifier><identifier>DOI: 10.1109/TASLP.2020.2967567</identifier><identifier>CODEN: ITASD8</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Artificial neural networks ; Automatic speaker recognition ; Discriminant analysis ; Error analysis ; Hospitals ; Pathology ; Performance evaluation ; septoplasty ; sinus surgery ; Speaker recognition ; Speech processing ; Speech recognition ; Surgery ; tonsillectomy</subject><ispartof>IEEE/ACM transactions on audio, speech, and language processing, 2020, Vol.28, p.798-812</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c295t-95afc708c703c79ddc9751f76b8877a5c8b308020ecc1560267aa0d0ba5de1bd3</citedby><cites>FETCH-LOGICAL-c295t-95afc708c703c79ddc9751f76b8877a5c8b308020ecc1560267aa0d0ba5de1bd3</cites><orcidid>0000-0002-6060-387X ; 0000-0002-3033-7005</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8962163$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,4024,27923,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8962163$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Moro-Velaquez, Laureano</creatorcontrib><creatorcontrib>Hernandez-Garcia, Estefania</creatorcontrib><creatorcontrib>Gomez-Garcia, Jorge A.</creatorcontrib><creatorcontrib>Godino-Llorente, Juan I.</creatorcontrib><creatorcontrib>Dehak, Najim</creatorcontrib><title>Analysis of the Effects of Supraglottal Tract Surgical Procedures in Automatic Speaker Recognition Performance</title><title>IEEE/ACM transactions on audio, speech, and language processing</title><addtitle>TASLP</addtitle><description>This article evaluates the impact in the performance of state-of-the-art automatic speaker recognition schemes of three surgical procedures modifying the supraglottal tract structures of speakers. To do so, a new corpus (Cuco) was recorded, containing the speech of 107 speakers before and after surgery. Speakers were divided into four groups depending on the type of surgery: tonsillectomy, functional endoscopy sinus surgery (FESS), septoplasty, and controls. The analyzed speaker recognition schemes were i-vectors, i-vectors with supervised Universal Background Model, i-vectors employing Time-delay Deep Neural Networks and x-vectors. In all cases, probabilistic linear discriminant analysis was employed in the back-end. Results show changes in the speech of patients who underwent tonsillectomy or FESS after surgery in contrast to controls or patients who had a septoplasty, where not significant variations are observed. These changes increase the Equal Error Rate (EER) of the analyzed speaker recognition schemes for the septoplasty and FESS groups when employing enrollment data recorded before the surgery. Moreover, surgery has a similar influence in the speech of female and male speakers with respect to the analyzed schemes. In consequence, results suggest that it is advisable to update the speaker's enrollment speech after three months following supraglottal tract surgery to ensure that the effects of the operation and post-operative recovery period do not influence the performance of the automatic speaker recognition systems.</description><subject>Artificial neural networks</subject><subject>Automatic speaker recognition</subject><subject>Discriminant analysis</subject><subject>Error analysis</subject><subject>Hospitals</subject><subject>Pathology</subject><subject>Performance evaluation</subject><subject>septoplasty</subject><subject>sinus surgery</subject><subject>Speaker recognition</subject><subject>Speech processing</subject><subject>Speech recognition</subject><subject>Surgery</subject><subject>tonsillectomy</subject><issn>2329-9290</issn><issn>2329-9304</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kMtOwzAQRS0EElXpD8DGEuuUsd3E8TKqykOqREXLOnKcSUlJ42A7i_496QMWo5k7mjvSPYTcM5gyBuppk62XqykHDlOuEhkn8oqMuOAqUgJm138zV3BLJt7vAICBVErORqTNWt0cfO2prWj4QrqoKjThJNd95_S2sSHohm6cNmFYuW1tBrly1mDZO_S0bmnWB7vXoTZ03aH-Rkc_0NhtW4fatnSFrrJur1uDd-Sm0o3HyaWPyefzYjN_jZbvL2_zbBkZruIQqVhXRkI6lDBSlaVRMmaVTIo0lVLHJi0EpENgNIbFCfBEag0lFDoukRWlGJPH89_O2Z8efch3tndDVJ9zEXOmRKpguOLnK-Os9w6rvHP1XrtDziA_os1PaPMj2vyCdjA9nE01Iv4bUpVwlgjxC8y-dmA</recordid><startdate>2020</startdate><enddate>2020</enddate><creator>Moro-Velaquez, Laureano</creator><creator>Hernandez-Garcia, Estefania</creator><creator>Gomez-Garcia, Jorge A.</creator><creator>Godino-Llorente, Juan I.</creator><creator>Dehak, Najim</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-6060-387X</orcidid><orcidid>https://orcid.org/0000-0002-3033-7005</orcidid></search><sort><creationdate>2020</creationdate><title>Analysis of the Effects of Supraglottal Tract Surgical Procedures in Automatic Speaker Recognition Performance</title><author>Moro-Velaquez, Laureano ; Hernandez-Garcia, Estefania ; Gomez-Garcia, Jorge A. ; Godino-Llorente, Juan I. ; Dehak, Najim</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c295t-95afc708c703c79ddc9751f76b8877a5c8b308020ecc1560267aa0d0ba5de1bd3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Artificial neural networks</topic><topic>Automatic speaker recognition</topic><topic>Discriminant analysis</topic><topic>Error analysis</topic><topic>Hospitals</topic><topic>Pathology</topic><topic>Performance evaluation</topic><topic>septoplasty</topic><topic>sinus surgery</topic><topic>Speaker recognition</topic><topic>Speech processing</topic><topic>Speech recognition</topic><topic>Surgery</topic><topic>tonsillectomy</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Moro-Velaquez, Laureano</creatorcontrib><creatorcontrib>Hernandez-Garcia, Estefania</creatorcontrib><creatorcontrib>Gomez-Garcia, Jorge A.</creatorcontrib><creatorcontrib>Godino-Llorente, Juan I.</creatorcontrib><creatorcontrib>Dehak, Najim</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE/ACM transactions on audio, speech, and language processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Moro-Velaquez, Laureano</au><au>Hernandez-Garcia, Estefania</au><au>Gomez-Garcia, Jorge A.</au><au>Godino-Llorente, Juan I.</au><au>Dehak, Najim</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Analysis of the Effects of Supraglottal Tract Surgical Procedures in Automatic Speaker Recognition Performance</atitle><jtitle>IEEE/ACM transactions on audio, speech, and language processing</jtitle><stitle>TASLP</stitle><date>2020</date><risdate>2020</risdate><volume>28</volume><spage>798</spage><epage>812</epage><pages>798-812</pages><issn>2329-9290</issn><eissn>2329-9304</eissn><coden>ITASD8</coden><abstract>This article evaluates the impact in the performance of state-of-the-art automatic speaker recognition schemes of three surgical procedures modifying the supraglottal tract structures of speakers. To do so, a new corpus (Cuco) was recorded, containing the speech of 107 speakers before and after surgery. Speakers were divided into four groups depending on the type of surgery: tonsillectomy, functional endoscopy sinus surgery (FESS), septoplasty, and controls. The analyzed speaker recognition schemes were i-vectors, i-vectors with supervised Universal Background Model, i-vectors employing Time-delay Deep Neural Networks and x-vectors. In all cases, probabilistic linear discriminant analysis was employed in the back-end. Results show changes in the speech of patients who underwent tonsillectomy or FESS after surgery in contrast to controls or patients who had a septoplasty, where not significant variations are observed. These changes increase the Equal Error Rate (EER) of the analyzed speaker recognition schemes for the septoplasty and FESS groups when employing enrollment data recorded before the surgery. Moreover, surgery has a similar influence in the speech of female and male speakers with respect to the analyzed schemes. In consequence, results suggest that it is advisable to update the speaker's enrollment speech after three months following supraglottal tract surgery to ensure that the effects of the operation and post-operative recovery period do not influence the performance of the automatic speaker recognition systems.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TASLP.2020.2967567</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0002-6060-387X</orcidid><orcidid>https://orcid.org/0000-0002-3033-7005</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 2329-9290
ispartof IEEE/ACM transactions on audio, speech, and language processing, 2020, Vol.28, p.798-812
issn 2329-9290
2329-9304
language eng
recordid cdi_ieee_primary_8962163
source IEEE Electronic Library (IEL)
subjects Artificial neural networks
Automatic speaker recognition
Discriminant analysis
Error analysis
Hospitals
Pathology
Performance evaluation
septoplasty
sinus surgery
Speaker recognition
Speech processing
Speech recognition
Surgery
tonsillectomy
title Analysis of the Effects of Supraglottal Tract Surgical Procedures in Automatic Speaker Recognition Performance
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T07%3A49%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Analysis%20of%20the%20Effects%20of%20Supraglottal%20Tract%20Surgical%20Procedures%20in%20Automatic%20Speaker%20Recognition%20Performance&rft.jtitle=IEEE/ACM%20transactions%20on%20audio,%20speech,%20and%20language%20processing&rft.au=Moro-Velaquez,%20Laureano&rft.date=2020&rft.volume=28&rft.spage=798&rft.epage=812&rft.pages=798-812&rft.issn=2329-9290&rft.eissn=2329-9304&rft.coden=ITASD8&rft_id=info:doi/10.1109/TASLP.2020.2967567&rft_dat=%3Cproquest_RIE%3E2352193890%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2352193890&rft_id=info:pmid/&rft_ieee_id=8962163&rfr_iscdi=true