GMM/SVM N-best speaker identification under mismatch channel conditions
Under severe channel mismatch conditions, such as training with far-field speech and testing with telephone data, performance of speaker identification (SID) degrades significantly, often below practical use. But for many SID tasks, it is sufficient to recognize an N-best list of speakers for furthe...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 4132 |
---|---|
container_issue | |
container_start_page | 4129 |
container_title | |
container_volume | |
creator | Zeljkovic, I. Haffner, P. Amento, B. Wilpon, J. |
description | Under severe channel mismatch conditions, such as training with far-field speech and testing with telephone data, performance of speaker identification (SID) degrades significantly, often below practical use. But for many SID tasks, it is sufficient to recognize an N-best list of speakers for further human analysis. We investigate N-best SID accuracy for matched (telephone/telephone) and mismatched (far-field/telephone) train/test channel conditions. Using an SVM-GMM supervector (GSV), pitch and formant frequency histograms (PFH) and cross-channel adaptation using cohorts, we reduced matched channel error rate by over 25% relative to the baseline (GMM-UBM), for top-1, and achieved mismatched N-best accuracy comparable to the baseline. |
doi_str_mv | 10.1109/ICASSP.2008.4518563 |
format | Conference Proceeding |
fullrecord | <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_4518563</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>4518563</ieee_id><sourcerecordid>4518563</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-bf4d67cc0c0b04eb37dd0e1200d13ea2bafa60fbee17feab65f214049b2f8de63</originalsourceid><addsrcrecordid>eNo1UMtOwzAQNC-JUvoFveQHku76kcRHVEFAagGpgLhVfqxVQ5tWcTjw9wRR5jLSzGi1M4xNEQpE0LOH-c1q9VxwgLqQCmtVihN2hZJLibKW-pSNuKh0jhrez9hEV_W_J8Q5G6HikJco9SWbpPQBA6QSSqsRa5rlcrZ6W2aPuaXUZ-lA5pO6LHpq-xiiM33ct9lX6wdxF9PO9G6TuY1pW9pmbt_6-BtI1-wimG2iyZHH7PXu9mV-ny-emuH7RR6xUn1ug_Rl5Rw4sCDJisp7IByKeRRkuDXBlBAsEVaBjC1V4ChBastD7akUYzb9uxuJaH3o4s503-vjJuIHWsNSZw</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>GMM/SVM N-best speaker identification under mismatch channel conditions</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Zeljkovic, I. ; Haffner, P. ; Amento, B. ; Wilpon, J.</creator><creatorcontrib>Zeljkovic, I. ; Haffner, P. ; Amento, B. ; Wilpon, J.</creatorcontrib><description>Under severe channel mismatch conditions, such as training with far-field speech and testing with telephone data, performance of speaker identification (SID) degrades significantly, often below practical use. But for many SID tasks, it is sufficient to recognize an N-best list of speakers for further human analysis. We investigate N-best SID accuracy for matched (telephone/telephone) and mismatched (far-field/telephone) train/test channel conditions. Using an SVM-GMM supervector (GSV), pitch and formant frequency histograms (PFH) and cross-channel adaptation using cohorts, we reduced matched channel error rate by over 25% relative to the baseline (GMM-UBM), for top-1, and achieved mismatched N-best accuracy comparable to the baseline.</description><identifier>ISSN: 1520-6149</identifier><identifier>ISBN: 9781424414833</identifier><identifier>ISBN: 1424414830</identifier><identifier>EISSN: 2379-190X</identifier><identifier>EISBN: 1424414849</identifier><identifier>EISBN: 9781424414840</identifier><identifier>DOI: 10.1109/ICASSP.2008.4518563</identifier><language>eng</language><publisher>IEEE</publisher><subject>cohort speaker adaptation ; Degradation ; formants ; GMM ; Histograms ; Humans ; Internet telephony ; Microphones ; Robustness ; Speaker identification ; Speech analysis ; Strontium ; Support vector machines ; SVM ; Testing</subject><ispartof>2008 IEEE International Conference on Acoustics, Speech and Signal Processing, 2008, p.4129-4132</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/4518563$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2052,27902,54895</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/4518563$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Zeljkovic, I.</creatorcontrib><creatorcontrib>Haffner, P.</creatorcontrib><creatorcontrib>Amento, B.</creatorcontrib><creatorcontrib>Wilpon, J.</creatorcontrib><title>GMM/SVM N-best speaker identification under mismatch channel conditions</title><title>2008 IEEE International Conference on Acoustics, Speech and Signal Processing</title><addtitle>ICASSP</addtitle><description>Under severe channel mismatch conditions, such as training with far-field speech and testing with telephone data, performance of speaker identification (SID) degrades significantly, often below practical use. But for many SID tasks, it is sufficient to recognize an N-best list of speakers for further human analysis. We investigate N-best SID accuracy for matched (telephone/telephone) and mismatched (far-field/telephone) train/test channel conditions. Using an SVM-GMM supervector (GSV), pitch and formant frequency histograms (PFH) and cross-channel adaptation using cohorts, we reduced matched channel error rate by over 25% relative to the baseline (GMM-UBM), for top-1, and achieved mismatched N-best accuracy comparable to the baseline.</description><subject>cohort speaker adaptation</subject><subject>Degradation</subject><subject>formants</subject><subject>GMM</subject><subject>Histograms</subject><subject>Humans</subject><subject>Internet telephony</subject><subject>Microphones</subject><subject>Robustness</subject><subject>Speaker identification</subject><subject>Speech analysis</subject><subject>Strontium</subject><subject>Support vector machines</subject><subject>SVM</subject><subject>Testing</subject><issn>1520-6149</issn><issn>2379-190X</issn><isbn>9781424414833</isbn><isbn>1424414830</isbn><isbn>1424414849</isbn><isbn>9781424414840</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2008</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNo1UMtOwzAQNC-JUvoFveQHku76kcRHVEFAagGpgLhVfqxVQ5tWcTjw9wRR5jLSzGi1M4xNEQpE0LOH-c1q9VxwgLqQCmtVihN2hZJLibKW-pSNuKh0jhrez9hEV_W_J8Q5G6HikJco9SWbpPQBA6QSSqsRa5rlcrZ6W2aPuaXUZ-lA5pO6LHpq-xiiM33ct9lX6wdxF9PO9G6TuY1pW9pmbt_6-BtI1-wimG2iyZHH7PXu9mV-ny-emuH7RR6xUn1ug_Rl5Rw4sCDJisp7IByKeRRkuDXBlBAsEVaBjC1V4ChBastD7akUYzb9uxuJaH3o4s503-vjJuIHWsNSZw</recordid><startdate>200803</startdate><enddate>200803</enddate><creator>Zeljkovic, I.</creator><creator>Haffner, P.</creator><creator>Amento, B.</creator><creator>Wilpon, J.</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>200803</creationdate><title>GMM/SVM N-best speaker identification under mismatch channel conditions</title><author>Zeljkovic, I. ; Haffner, P. ; Amento, B. ; Wilpon, J.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-bf4d67cc0c0b04eb37dd0e1200d13ea2bafa60fbee17feab65f214049b2f8de63</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2008</creationdate><topic>cohort speaker adaptation</topic><topic>Degradation</topic><topic>formants</topic><topic>GMM</topic><topic>Histograms</topic><topic>Humans</topic><topic>Internet telephony</topic><topic>Microphones</topic><topic>Robustness</topic><topic>Speaker identification</topic><topic>Speech analysis</topic><topic>Strontium</topic><topic>Support vector machines</topic><topic>SVM</topic><topic>Testing</topic><toplevel>online_resources</toplevel><creatorcontrib>Zeljkovic, I.</creatorcontrib><creatorcontrib>Haffner, P.</creatorcontrib><creatorcontrib>Amento, B.</creatorcontrib><creatorcontrib>Wilpon, J.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zeljkovic, I.</au><au>Haffner, P.</au><au>Amento, B.</au><au>Wilpon, J.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>GMM/SVM N-best speaker identification under mismatch channel conditions</atitle><btitle>2008 IEEE International Conference on Acoustics, Speech and Signal Processing</btitle><stitle>ICASSP</stitle><date>2008-03</date><risdate>2008</risdate><spage>4129</spage><epage>4132</epage><pages>4129-4132</pages><issn>1520-6149</issn><eissn>2379-190X</eissn><isbn>9781424414833</isbn><isbn>1424414830</isbn><eisbn>1424414849</eisbn><eisbn>9781424414840</eisbn><abstract>Under severe channel mismatch conditions, such as training with far-field speech and testing with telephone data, performance of speaker identification (SID) degrades significantly, often below practical use. But for many SID tasks, it is sufficient to recognize an N-best list of speakers for further human analysis. We investigate N-best SID accuracy for matched (telephone/telephone) and mismatched (far-field/telephone) train/test channel conditions. Using an SVM-GMM supervector (GSV), pitch and formant frequency histograms (PFH) and cross-channel adaptation using cohorts, we reduced matched channel error rate by over 25% relative to the baseline (GMM-UBM), for top-1, and achieved mismatched N-best accuracy comparable to the baseline.</abstract><pub>IEEE</pub><doi>10.1109/ICASSP.2008.4518563</doi><tpages>4</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1520-6149 |
ispartof | 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, 2008, p.4129-4132 |
issn | 1520-6149 2379-190X |
language | eng |
recordid | cdi_ieee_primary_4518563 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | cohort speaker adaptation Degradation formants GMM Histograms Humans Internet telephony Microphones Robustness Speaker identification Speech analysis Strontium Support vector machines SVM Testing |
title | GMM/SVM N-best speaker identification under mismatch channel conditions |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T21%3A17%3A01IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=GMM/SVM%20N-best%20speaker%20identification%20under%20mismatch%20channel%20conditions&rft.btitle=2008%20IEEE%20International%20Conference%20on%20Acoustics,%20Speech%20and%20Signal%20Processing&rft.au=Zeljkovic,%20I.&rft.date=2008-03&rft.spage=4129&rft.epage=4132&rft.pages=4129-4132&rft.issn=1520-6149&rft.eissn=2379-190X&rft.isbn=9781424414833&rft.isbn_list=1424414830&rft_id=info:doi/10.1109/ICASSP.2008.4518563&rft_dat=%3Cieee_6IE%3E4518563%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=1424414849&rft.eisbn_list=9781424414840&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=4518563&rfr_iscdi=true |