GMM/SVM N-best speaker identification under mismatch channel conditions

Under severe channel mismatch conditions, such as training with far-field speech and testing with telephone data, performance of speaker identification (SID) degrades significantly, often below practical use. But for many SID tasks, it is sufficient to recognize an N-best list of speakers for furthe...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Zeljkovic, I., Haffner, P., Amento, B., Wilpon, J.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	cohort speaker adaptation Degradation formants GMM Histograms Humans Internet telephony Microphones Robustness Speaker identification Speech analysis Strontium Support vector machines SVM Testing
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	4132
container_issue
container_start_page	4129
container_title
container_volume
creator	Zeljkovic, I. Haffner, P. Amento, B. Wilpon, J.
description	Under severe channel mismatch conditions, such as training with far-field speech and testing with telephone data, performance of speaker identification (SID) degrades significantly, often below practical use. But for many SID tasks, it is sufficient to recognize an N-best list of speakers for further human analysis. We investigate N-best SID accuracy for matched (telephone/telephone) and mismatched (far-field/telephone) train/test channel conditions. Using an SVM-GMM supervector (GSV), pitch and formant frequency histograms (PFH) and cross-channel adaptation using cohorts, we reduced matched channel error rate by over 25% relative to the baseline (GMM-UBM), for top-1, and achieved mismatched N-best accuracy comparable to the baseline.
doi_str_mv	10.1109/ICASSP.2008.4518563
format	Conference Proceeding
fullrecord	<record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_4518563</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>4518563</ieee_id><sourcerecordid>4518563</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-bf4d67cc0c0b04eb37dd0e1200d13ea2bafa60fbee17feab65f214049b2f8de63</originalsourceid><addsrcrecordid>eNo1UMtOwzAQNC-JUvoFveQHku76kcRHVEFAagGpgLhVfqxVQ5tWcTjw9wRR5jLSzGi1M4xNEQpE0LOH-c1q9VxwgLqQCmtVihN2hZJLibKW-pSNuKh0jhrez9hEV_W_J8Q5G6HikJco9SWbpPQBA6QSSqsRa5rlcrZ6W2aPuaXUZ-lA5pO6LHpq-xiiM33ct9lX6wdxF9PO9G6TuY1pW9pmbt_6-BtI1-wimG2iyZHH7PXu9mV-ny-emuH7RR6xUn1ug_Rl5Rw4sCDJisp7IByKeRRkuDXBlBAsEVaBjC1V4ChBastD7akUYzb9uxuJaH3o4s503-vjJuIHWsNSZw</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>GMM/SVM N-best speaker identification under mismatch channel conditions</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Zeljkovic, I. ; Haffner, P. ; Amento, B. ; Wilpon, J.</creator><creatorcontrib>Zeljkovic, I. ; Haffner, P. ; Amento, B. ; Wilpon, J.</creatorcontrib><description>Under severe channel mismatch conditions, such as training with far-field speech and testing with telephone data, performance of speaker identification (SID) degrades significantly, often below practical use. But for many SID tasks, it is sufficient to recognize an N-best list of speakers for further human analysis. We investigate N-best SID accuracy for matched (telephone/telephone) and mismatched (far-field/telephone) train/test channel conditions. Using an SVM-GMM supervector (GSV), pitch and formant frequency histograms (PFH) and cross-channel adaptation using cohorts, we reduced matched channel error rate by over 25% relative to the baseline (GMM-UBM), for top-1, and achieved mismatched N-best accuracy comparable to the baseline.</description><identifier>ISSN: 1520-6149</identifier><identifier>ISBN: 9781424414833</identifier><identifier>ISBN: 1424414830</identifier><identifier>EISSN: 2379-190X</identifier><identifier>EISBN: 1424414849</identifier><identifier>EISBN: 9781424414840</identifier><identifier>DOI: 10.1109/ICASSP.2008.4518563</identifier><language>eng</language><publisher>IEEE</publisher><subject>cohort speaker adaptation ; Degradation ; formants ; GMM ; Histograms ; Humans ; Internet telephony ; Microphones ; Robustness ; Speaker identification ; Speech analysis ; Strontium ; Support vector machines ; SVM ; Testing</subject><ispartof>2008 IEEE International Conference on Acoustics, Speech and Signal Processing, 2008, p.4129-4132</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/4518563$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2052,27902,54895</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/4518563$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Zeljkovic, I.</creatorcontrib><creatorcontrib>Haffner, P.</creatorcontrib><creatorcontrib>Amento, B.</creatorcontrib><creatorcontrib>Wilpon, J.</creatorcontrib><title>GMM/SVM N-best speaker identification under mismatch channel conditions</title><title>2008 IEEE International Conference on Acoustics, Speech and Signal Processing</title><addtitle>ICASSP</addtitle><description>Under severe channel mismatch conditions, such as training with far-field speech and testing with telephone data, performance of speaker identification (SID) degrades significantly, often below practical use. But for many SID tasks, it is sufficient to recognize an N-best list of speakers for further human analysis. We investigate N-best SID accuracy for matched (telephone/telephone) and mismatched (far-field/telephone) train/test channel conditions. Using an SVM-GMM supervector (GSV), pitch and formant frequency histograms (PFH) and cross-channel adaptation using cohorts, we reduced matched channel error rate by over 25% relative to the baseline (GMM-UBM), for top-1, and achieved mismatched N-best accuracy comparable to the baseline.</description><subject>cohort speaker adaptation</subject><subject>Degradation</subject><subject>formants</subject><subject>GMM</subject><subject>Histograms</subject><subject>Humans</subject><subject>Internet telephony</subject><subject>Microphones</subject><subject>Robustness</subject><subject>Speaker identification</subject><subject>Speech analysis</subject><subject>Strontium</subject><subject>Support vector machines</subject><subject>SVM</subject><subject>Testing</subject><issn>1520-6149</issn><issn>2379-190X</issn><isbn>9781424414833</isbn><isbn>1424414830</isbn><isbn>1424414849</isbn><isbn>9781424414840</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2008</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNo1UMtOwzAQNC-JUvoFveQHku76kcRHVEFAagGpgLhVfqxVQ5tWcTjw9wRR5jLSzGi1M4xNEQpE0LOH-c1q9VxwgLqQCmtVihN2hZJLibKW-pSNuKh0jhrez9hEV_W_J8Q5G6HikJco9SWbpPQBA6QSSqsRa5rlcrZ6W2aPuaXUZ-lA5pO6LHpq-xiiM33ct9lX6wdxF9PO9G6TuY1pW9pmbt_6-BtI1-wimG2iyZHH7PXu9mV-ny-emuH7RR6xUn1ug_Rl5Rw4sCDJisp7IByKeRRkuDXBlBAsEVaBjC1V4ChBastD7akUYzb9uxuJaH3o4s503-vjJuIHWsNSZw</recordid><startdate>200803</startdate><enddate>200803</enddate><creator>Zeljkovic, I.</creator><creator>Haffner, P.</creator><creator>Amento, B.</creator><creator>Wilpon, J.</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>200803</creationdate><title>GMM/SVM N-best speaker identification under mismatch channel conditions</title><author>Zeljkovic, I. ; Haffner, P. ; Amento, B. ; Wilpon, J.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-bf4d67cc0c0b04eb37dd0e1200d13ea2bafa60fbee17feab65f214049b2f8de63</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2008</creationdate><topic>cohort speaker adaptation</topic><topic>Degradation</topic><topic>formants</topic><topic>GMM</topic><topic>Histograms</topic><topic>Humans</topic><topic>Internet telephony</topic><topic>Microphones</topic><topic>Robustness</topic><topic>Speaker identification</topic><topic>Speech analysis</topic><topic>Strontium</topic><topic>Support vector machines</topic><topic>SVM</topic><topic>Testing</topic><toplevel>online_resources</toplevel><creatorcontrib>Zeljkovic, I.</creatorcontrib><creatorcontrib>Haffner, P.</creatorcontrib><creatorcontrib>Amento, B.</creatorcontrib><creatorcontrib>Wilpon, J.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zeljkovic, I.</au><au>Haffner, P.</au><au>Amento, B.</au><au>Wilpon, J.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>GMM/SVM N-best speaker identification under mismatch channel conditions</atitle><btitle>2008 IEEE International Conference on Acoustics, Speech and Signal Processing</btitle><stitle>ICASSP</stitle><date>2008-03</date><risdate>2008</risdate><spage>4129</spage><epage>4132</epage><pages>4129-4132</pages><issn>1520-6149</issn><eissn>2379-190X</eissn><isbn>9781424414833</isbn><isbn>1424414830</isbn><eisbn>1424414849</eisbn><eisbn>9781424414840</eisbn><abstract>Under severe channel mismatch conditions, such as training with far-field speech and testing with telephone data, performance of speaker identification (SID) degrades significantly, often below practical use. But for many SID tasks, it is sufficient to recognize an N-best list of speakers for further human analysis. We investigate N-best SID accuracy for matched (telephone/telephone) and mismatched (far-field/telephone) train/test channel conditions. Using an SVM-GMM supervector (GSV), pitch and formant frequency histograms (PFH) and cross-channel adaptation using cohorts, we reduced matched channel error rate by over 25% relative to the baseline (GMM-UBM), for top-1, and achieved mismatched N-best accuracy comparable to the baseline.</abstract><pub>IEEE</pub><doi>10.1109/ICASSP.2008.4518563</doi><tpages>4</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1520-6149
ispartof	2008 IEEE International Conference on Acoustics, Speech and Signal Processing, 2008, p.4129-4132
issn	1520-6149 2379-190X
language	eng
recordid	cdi_ieee_primary_4518563
source	IEEE Electronic Library (IEL) Conference Proceedings
subjects	cohort speaker adaptation Degradation formants GMM Histograms Humans Internet telephony Microphones Robustness Speaker identification Speech analysis Strontium Support vector machines SVM Testing
title	GMM/SVM N-best speaker identification under mismatch channel conditions
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T21%3A17%3A01IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=GMM/SVM%20N-best%20speaker%20identification%20under%20mismatch%20channel%20conditions&rft.btitle=2008%20IEEE%20International%20Conference%20on%20Acoustics,%20Speech%20and%20Signal%20Processing&rft.au=Zeljkovic,%20I.&rft.date=2008-03&rft.spage=4129&rft.epage=4132&rft.pages=4129-4132&rft.issn=1520-6149&rft.eissn=2379-190X&rft.isbn=9781424414833&rft.isbn_list=1424414830&rft_id=info:doi/10.1109/ICASSP.2008.4518563&rft_dat=%3Cieee_6IE%3E4518563%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=1424414849&rft.eisbn_list=9781424414840&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=4518563&rfr_iscdi=true