Frame selection of interview channel for NIST speaker recognition evaluation

In this paper, we study a front-end frame selection approach for the interview channel speaker recognition system. This new approach keeps the high quality speech frames and removes noisy and irrelevant speech frames for speaker modeling. For robust voice activity detection (VAD) under the different...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Hanwu Sun, Bin Ma, Haizhou Li
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	distant microphone GMM-SVM interview channel Interviews Microphones NIST Speaker recognition Speech Speech processing Speech recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	308
container_issue
container_start_page	305
container_title
container_volume
creator	Hanwu Sun Bin Ma Haizhou Li
description	In this paper, we study a front-end frame selection approach for the interview channel speaker recognition system. This new approach keeps the high quality speech frames and removes noisy and irrelevant speech frames for speaker modeling. For robust voice activity detection (VAD) under the different types of microphones located in the interview room, we adopt the spectral subtraction algorithm for noise reduction. An energy based frame selection algorithm is first applied to indicate the speech activity at the frame level. To overcome the summed channel effects in the interview condition, a study is conducted to effectively extract the relevant speaker's speech frames based on VAD Tags and ASR transcript Tags provided by NIST. The eigenchannel based GMM-SVM speaker recognition system is used to evaluate the proposed method. The experiments are conducted on the NIST 2008 and NIST 2010 Speaker Recognition Evaluation interview-interview conditions. It demonstrates that the approach provides an efficient way to select high quality speech frames and the relevant speaker's voice in the interview environment for speaker recognition.
doi_str_mv	10.1109/ISCSLP.2010.5684886
format	Conference Proceeding
fullrecord	<record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_5684886</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5684886</ieee_id><sourcerecordid>5684886</sourcerecordid><originalsourceid>FETCH-LOGICAL-i90t-370d513d17f00662403834bee3999b07043258f42c63f0ff2b1fd3dc9c6d57a23</originalsourceid><addsrcrecordid>eNo1T8tOwzAQNEJIQMkX9OIfSFk_4thHVFGIFAFScq8cZw2GNKmcUMTfk0KZyzykXc0QsmSwYgzMbVGtq_JlxWEOMqWl1uqMJCbXTHIpFZfKnJPrfyPlJUnG8R1mZEoak12RchPtDumIHbopDD0dPA39hPEQ8Iu6N9v32FE_RPpUVDUd92g_MNKIbnjtw-8FHmz3aY_yhlx4242YnHhB6s19vX5My-eHYn1XpsHAlIoc2oyJluUeQM29QGghG0RhjGkgByl4pr3kTgkP3vOG-Va0zjjVZrnlYkGWf28DIm73Mexs_N6e9osfPxdOxg</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Frame selection of interview channel for NIST speaker recognition evaluation</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Hanwu Sun ; Bin Ma ; Haizhou Li</creator><creatorcontrib>Hanwu Sun ; Bin Ma ; Haizhou Li</creatorcontrib><description>In this paper, we study a front-end frame selection approach for the interview channel speaker recognition system. This new approach keeps the high quality speech frames and removes noisy and irrelevant speech frames for speaker modeling. For robust voice activity detection (VAD) under the different types of microphones located in the interview room, we adopt the spectral subtraction algorithm for noise reduction. An energy based frame selection algorithm is first applied to indicate the speech activity at the frame level. To overcome the summed channel effects in the interview condition, a study is conducted to effectively extract the relevant speaker's speech frames based on VAD Tags and ASR transcript Tags provided by NIST. The eigenchannel based GMM-SVM speaker recognition system is used to evaluate the proposed method. The experiments are conducted on the NIST 2008 and NIST 2010 Speaker Recognition Evaluation interview-interview conditions. It demonstrates that the approach provides an efficient way to select high quality speech frames and the relevant speaker's voice in the interview environment for speaker recognition.</description><identifier>ISBN: 1424462444</identifier><identifier>ISBN: 9781424462445</identifier><identifier>EISBN: 9781424462469</identifier><identifier>EISBN: 1424462460</identifier><identifier>EISBN: 1424462452</identifier><identifier>EISBN: 9781424462452</identifier><identifier>DOI: 10.1109/ISCSLP.2010.5684886</identifier><language>eng</language><publisher>IEEE</publisher><subject>distant microphone ; GMM-SVM ; interview channel ; Interviews ; Microphones ; NIST ; Speaker recognition ; Speech ; Speech processing ; Speech recognition</subject><ispartof>2010 7th International Symposium on Chinese Spoken Language Processing, 2010, p.305-308</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5684886$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2058,27925,54920</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5684886$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Hanwu Sun</creatorcontrib><creatorcontrib>Bin Ma</creatorcontrib><creatorcontrib>Haizhou Li</creatorcontrib><title>Frame selection of interview channel for NIST speaker recognition evaluation</title><title>2010 7th International Symposium on Chinese Spoken Language Processing</title><addtitle>ISCSLP</addtitle><description>In this paper, we study a front-end frame selection approach for the interview channel speaker recognition system. This new approach keeps the high quality speech frames and removes noisy and irrelevant speech frames for speaker modeling. For robust voice activity detection (VAD) under the different types of microphones located in the interview room, we adopt the spectral subtraction algorithm for noise reduction. An energy based frame selection algorithm is first applied to indicate the speech activity at the frame level. To overcome the summed channel effects in the interview condition, a study is conducted to effectively extract the relevant speaker's speech frames based on VAD Tags and ASR transcript Tags provided by NIST. The eigenchannel based GMM-SVM speaker recognition system is used to evaluate the proposed method. The experiments are conducted on the NIST 2008 and NIST 2010 Speaker Recognition Evaluation interview-interview conditions. It demonstrates that the approach provides an efficient way to select high quality speech frames and the relevant speaker's voice in the interview environment for speaker recognition.</description><subject>distant microphone</subject><subject>GMM-SVM</subject><subject>interview channel</subject><subject>Interviews</subject><subject>Microphones</subject><subject>NIST</subject><subject>Speaker recognition</subject><subject>Speech</subject><subject>Speech processing</subject><subject>Speech recognition</subject><isbn>1424462444</isbn><isbn>9781424462445</isbn><isbn>9781424462469</isbn><isbn>1424462460</isbn><isbn>1424462452</isbn><isbn>9781424462452</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2010</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNo1T8tOwzAQNEJIQMkX9OIfSFk_4thHVFGIFAFScq8cZw2GNKmcUMTfk0KZyzykXc0QsmSwYgzMbVGtq_JlxWEOMqWl1uqMJCbXTHIpFZfKnJPrfyPlJUnG8R1mZEoak12RchPtDumIHbopDD0dPA39hPEQ8Iu6N9v32FE_RPpUVDUd92g_MNKIbnjtw-8FHmz3aY_yhlx4242YnHhB6s19vX5My-eHYn1XpsHAlIoc2oyJluUeQM29QGghG0RhjGkgByl4pr3kTgkP3vOG-Va0zjjVZrnlYkGWf28DIm73Mexs_N6e9osfPxdOxg</recordid><startdate>201011</startdate><enddate>201011</enddate><creator>Hanwu Sun</creator><creator>Bin Ma</creator><creator>Haizhou Li</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>201011</creationdate><title>Frame selection of interview channel for NIST speaker recognition evaluation</title><author>Hanwu Sun ; Bin Ma ; Haizhou Li</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i90t-370d513d17f00662403834bee3999b07043258f42c63f0ff2b1fd3dc9c6d57a23</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2010</creationdate><topic>distant microphone</topic><topic>GMM-SVM</topic><topic>interview channel</topic><topic>Interviews</topic><topic>Microphones</topic><topic>NIST</topic><topic>Speaker recognition</topic><topic>Speech</topic><topic>Speech processing</topic><topic>Speech recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Hanwu Sun</creatorcontrib><creatorcontrib>Bin Ma</creatorcontrib><creatorcontrib>Haizhou Li</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Hanwu Sun</au><au>Bin Ma</au><au>Haizhou Li</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Frame selection of interview channel for NIST speaker recognition evaluation</atitle><btitle>2010 7th International Symposium on Chinese Spoken Language Processing</btitle><stitle>ISCSLP</stitle><date>2010-11</date><risdate>2010</risdate><spage>305</spage><epage>308</epage><pages>305-308</pages><isbn>1424462444</isbn><isbn>9781424462445</isbn><eisbn>9781424462469</eisbn><eisbn>1424462460</eisbn><eisbn>1424462452</eisbn><eisbn>9781424462452</eisbn><abstract>In this paper, we study a front-end frame selection approach for the interview channel speaker recognition system. This new approach keeps the high quality speech frames and removes noisy and irrelevant speech frames for speaker modeling. For robust voice activity detection (VAD) under the different types of microphones located in the interview room, we adopt the spectral subtraction algorithm for noise reduction. An energy based frame selection algorithm is first applied to indicate the speech activity at the frame level. To overcome the summed channel effects in the interview condition, a study is conducted to effectively extract the relevant speaker's speech frames based on VAD Tags and ASR transcript Tags provided by NIST. The eigenchannel based GMM-SVM speaker recognition system is used to evaluate the proposed method. The experiments are conducted on the NIST 2008 and NIST 2010 Speaker Recognition Evaluation interview-interview conditions. It demonstrates that the approach provides an efficient way to select high quality speech frames and the relevant speaker's voice in the interview environment for speaker recognition.</abstract><pub>IEEE</pub><doi>10.1109/ISCSLP.2010.5684886</doi><tpages>4</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISBN: 1424462444
ispartof	2010 7th International Symposium on Chinese Spoken Language Processing, 2010, p.305-308
issn
language	eng
recordid	cdi_ieee_primary_5684886
source	IEEE Electronic Library (IEL) Conference Proceedings
subjects	distant microphone GMM-SVM interview channel Interviews Microphones NIST Speaker recognition Speech Speech processing Speech recognition
title	Frame selection of interview channel for NIST speaker recognition evaluation
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T11%3A44%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Frame%20selection%20of%20interview%20channel%20for%20NIST%20speaker%20recognition%20evaluation&rft.btitle=2010%207th%20International%20Symposium%20on%20Chinese%20Spoken%20Language%20Processing&rft.au=Hanwu%20Sun&rft.date=2010-11&rft.spage=305&rft.epage=308&rft.pages=305-308&rft.isbn=1424462444&rft.isbn_list=9781424462445&rft_id=info:doi/10.1109/ISCSLP.2010.5684886&rft_dat=%3Cieee_6IE%3E5684886%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9781424462469&rft.eisbn_list=1424462460&rft.eisbn_list=1424462452&rft.eisbn_list=9781424462452&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5684886&rfr_iscdi=true