GMM/SVM N-best speaker identification under mismatch channel conditions

Under severe channel mismatch conditions, such as training with far-field speech and testing with telephone data, performance of speaker identification (SID) degrades significantly, often below practical use. But for many SID tasks, it is sufficient to recognize an N-best list of speakers for furthe...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Zeljkovic, I., Haffner, P., Amento, B., Wilpon, J.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 4132
container_issue
container_start_page 4129
container_title
container_volume
creator Zeljkovic, I.
Haffner, P.
Amento, B.
Wilpon, J.
description Under severe channel mismatch conditions, such as training with far-field speech and testing with telephone data, performance of speaker identification (SID) degrades significantly, often below practical use. But for many SID tasks, it is sufficient to recognize an N-best list of speakers for further human analysis. We investigate N-best SID accuracy for matched (telephone/telephone) and mismatched (far-field/telephone) train/test channel conditions. Using an SVM-GMM supervector (GSV), pitch and formant frequency histograms (PFH) and cross-channel adaptation using cohorts, we reduced matched channel error rate by over 25% relative to the baseline (GMM-UBM), for top-1, and achieved mismatched N-best accuracy comparable to the baseline.
doi_str_mv 10.1109/ICASSP.2008.4518563
format Conference Proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_4518563</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>4518563</ieee_id><sourcerecordid>4518563</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-bf4d67cc0c0b04eb37dd0e1200d13ea2bafa60fbee17feab65f214049b2f8de63</originalsourceid><addsrcrecordid>eNo1UMtOwzAQNC-JUvoFveQHku76kcRHVEFAagGpgLhVfqxVQ5tWcTjw9wRR5jLSzGi1M4xNEQpE0LOH-c1q9VxwgLqQCmtVihN2hZJLibKW-pSNuKh0jhrez9hEV_W_J8Q5G6HikJco9SWbpPQBA6QSSqsRa5rlcrZ6W2aPuaXUZ-lA5pO6LHpq-xiiM33ct9lX6wdxF9PO9G6TuY1pW9pmbt_6-BtI1-wimG2iyZHH7PXu9mV-ny-emuH7RR6xUn1ug_Rl5Rw4sCDJisp7IByKeRRkuDXBlBAsEVaBjC1V4ChBastD7akUYzb9uxuJaH3o4s503-vjJuIHWsNSZw</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>GMM/SVM N-best speaker identification under mismatch channel conditions</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Zeljkovic, I. ; Haffner, P. ; Amento, B. ; Wilpon, J.</creator><creatorcontrib>Zeljkovic, I. ; Haffner, P. ; Amento, B. ; Wilpon, J.</creatorcontrib><description>Under severe channel mismatch conditions, such as training with far-field speech and testing with telephone data, performance of speaker identification (SID) degrades significantly, often below practical use. But for many SID tasks, it is sufficient to recognize an N-best list of speakers for further human analysis. We investigate N-best SID accuracy for matched (telephone/telephone) and mismatched (far-field/telephone) train/test channel conditions. Using an SVM-GMM supervector (GSV), pitch and formant frequency histograms (PFH) and cross-channel adaptation using cohorts, we reduced matched channel error rate by over 25% relative to the baseline (GMM-UBM), for top-1, and achieved mismatched N-best accuracy comparable to the baseline.</description><identifier>ISSN: 1520-6149</identifier><identifier>ISBN: 9781424414833</identifier><identifier>ISBN: 1424414830</identifier><identifier>EISSN: 2379-190X</identifier><identifier>EISBN: 1424414849</identifier><identifier>EISBN: 9781424414840</identifier><identifier>DOI: 10.1109/ICASSP.2008.4518563</identifier><language>eng</language><publisher>IEEE</publisher><subject>cohort speaker adaptation ; Degradation ; formants ; GMM ; Histograms ; Humans ; Internet telephony ; Microphones ; Robustness ; Speaker identification ; Speech analysis ; Strontium ; Support vector machines ; SVM ; Testing</subject><ispartof>2008 IEEE International Conference on Acoustics, Speech and Signal Processing, 2008, p.4129-4132</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/4518563$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2052,27902,54895</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/4518563$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Zeljkovic, I.</creatorcontrib><creatorcontrib>Haffner, P.</creatorcontrib><creatorcontrib>Amento, B.</creatorcontrib><creatorcontrib>Wilpon, J.</creatorcontrib><title>GMM/SVM N-best speaker identification under mismatch channel conditions</title><title>2008 IEEE International Conference on Acoustics, Speech and Signal Processing</title><addtitle>ICASSP</addtitle><description>Under severe channel mismatch conditions, such as training with far-field speech and testing with telephone data, performance of speaker identification (SID) degrades significantly, often below practical use. But for many SID tasks, it is sufficient to recognize an N-best list of speakers for further human analysis. We investigate N-best SID accuracy for matched (telephone/telephone) and mismatched (far-field/telephone) train/test channel conditions. Using an SVM-GMM supervector (GSV), pitch and formant frequency histograms (PFH) and cross-channel adaptation using cohorts, we reduced matched channel error rate by over 25% relative to the baseline (GMM-UBM), for top-1, and achieved mismatched N-best accuracy comparable to the baseline.</description><subject>cohort speaker adaptation</subject><subject>Degradation</subject><subject>formants</subject><subject>GMM</subject><subject>Histograms</subject><subject>Humans</subject><subject>Internet telephony</subject><subject>Microphones</subject><subject>Robustness</subject><subject>Speaker identification</subject><subject>Speech analysis</subject><subject>Strontium</subject><subject>Support vector machines</subject><subject>SVM</subject><subject>Testing</subject><issn>1520-6149</issn><issn>2379-190X</issn><isbn>9781424414833</isbn><isbn>1424414830</isbn><isbn>1424414849</isbn><isbn>9781424414840</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2008</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNo1UMtOwzAQNC-JUvoFveQHku76kcRHVEFAagGpgLhVfqxVQ5tWcTjw9wRR5jLSzGi1M4xNEQpE0LOH-c1q9VxwgLqQCmtVihN2hZJLibKW-pSNuKh0jhrez9hEV_W_J8Q5G6HikJco9SWbpPQBA6QSSqsRa5rlcrZ6W2aPuaXUZ-lA5pO6LHpq-xiiM33ct9lX6wdxF9PO9G6TuY1pW9pmbt_6-BtI1-wimG2iyZHH7PXu9mV-ny-emuH7RR6xUn1ug_Rl5Rw4sCDJisp7IByKeRRkuDXBlBAsEVaBjC1V4ChBastD7akUYzb9uxuJaH3o4s503-vjJuIHWsNSZw</recordid><startdate>200803</startdate><enddate>200803</enddate><creator>Zeljkovic, I.</creator><creator>Haffner, P.</creator><creator>Amento, B.</creator><creator>Wilpon, J.</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>200803</creationdate><title>GMM/SVM N-best speaker identification under mismatch channel conditions</title><author>Zeljkovic, I. ; Haffner, P. ; Amento, B. ; Wilpon, J.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-bf4d67cc0c0b04eb37dd0e1200d13ea2bafa60fbee17feab65f214049b2f8de63</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2008</creationdate><topic>cohort speaker adaptation</topic><topic>Degradation</topic><topic>formants</topic><topic>GMM</topic><topic>Histograms</topic><topic>Humans</topic><topic>Internet telephony</topic><topic>Microphones</topic><topic>Robustness</topic><topic>Speaker identification</topic><topic>Speech analysis</topic><topic>Strontium</topic><topic>Support vector machines</topic><topic>SVM</topic><topic>Testing</topic><toplevel>online_resources</toplevel><creatorcontrib>Zeljkovic, I.</creatorcontrib><creatorcontrib>Haffner, P.</creatorcontrib><creatorcontrib>Amento, B.</creatorcontrib><creatorcontrib>Wilpon, J.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zeljkovic, I.</au><au>Haffner, P.</au><au>Amento, B.</au><au>Wilpon, J.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>GMM/SVM N-best speaker identification under mismatch channel conditions</atitle><btitle>2008 IEEE International Conference on Acoustics, Speech and Signal Processing</btitle><stitle>ICASSP</stitle><date>2008-03</date><risdate>2008</risdate><spage>4129</spage><epage>4132</epage><pages>4129-4132</pages><issn>1520-6149</issn><eissn>2379-190X</eissn><isbn>9781424414833</isbn><isbn>1424414830</isbn><eisbn>1424414849</eisbn><eisbn>9781424414840</eisbn><abstract>Under severe channel mismatch conditions, such as training with far-field speech and testing with telephone data, performance of speaker identification (SID) degrades significantly, often below practical use. But for many SID tasks, it is sufficient to recognize an N-best list of speakers for further human analysis. We investigate N-best SID accuracy for matched (telephone/telephone) and mismatched (far-field/telephone) train/test channel conditions. Using an SVM-GMM supervector (GSV), pitch and formant frequency histograms (PFH) and cross-channel adaptation using cohorts, we reduced matched channel error rate by over 25% relative to the baseline (GMM-UBM), for top-1, and achieved mismatched N-best accuracy comparable to the baseline.</abstract><pub>IEEE</pub><doi>10.1109/ICASSP.2008.4518563</doi><tpages>4</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1520-6149
ispartof 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, 2008, p.4129-4132
issn 1520-6149
2379-190X
language eng
recordid cdi_ieee_primary_4518563
source IEEE Electronic Library (IEL) Conference Proceedings
subjects cohort speaker adaptation
Degradation
formants
GMM
Histograms
Humans
Internet telephony
Microphones
Robustness
Speaker identification
Speech analysis
Strontium
Support vector machines
SVM
Testing
title GMM/SVM N-best speaker identification under mismatch channel conditions
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T21%3A17%3A01IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=GMM/SVM%20N-best%20speaker%20identification%20under%20mismatch%20channel%20conditions&rft.btitle=2008%20IEEE%20International%20Conference%20on%20Acoustics,%20Speech%20and%20Signal%20Processing&rft.au=Zeljkovic,%20I.&rft.date=2008-03&rft.spage=4129&rft.epage=4132&rft.pages=4129-4132&rft.issn=1520-6149&rft.eissn=2379-190X&rft.isbn=9781424414833&rft.isbn_list=1424414830&rft_id=info:doi/10.1109/ICASSP.2008.4518563&rft_dat=%3Cieee_6IE%3E4518563%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=1424414849&rft.eisbn_list=9781424414840&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=4518563&rfr_iscdi=true