Speaker diarization in meeting audio
This paper describes speaker diarization system on a NIST Rich Transcription 2007 (RT-07) meeting recognition evaluation data set for the task of multiple distant microphone (MDM). Our implementation includes three components: initial clustering, non-speech removal and cluster purification. Initial...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 4076 |
---|---|
container_issue | |
container_start_page | 4073 |
container_title | |
container_volume | |
creator | Nwe, T.L. Hanwu Sun Haizhou Li Rahardja, S. |
description | This paper describes speaker diarization system on a NIST Rich Transcription 2007 (RT-07) meeting recognition evaluation data set for the task of multiple distant microphone (MDM). Our implementation includes three components: initial clustering, non-speech removal and cluster purification. Initial clusters are generated using directional of arrival (DOA) information and bootstrap clustering. Multiple GMM modeling for speech/non-speech classification is employed for non-speech removal component. In addition, a novel system fusion strategy using information from receiver operating curve (ROC) is proposed for non-speech removal component. Finally, consensus clustering approach together with iterative GMM clustering method is employed for speaker cluster purification. The system achieves the overall DER of 10.81%. |
doi_str_mv | 10.1109/ICASSP.2009.4960523 |
format | Conference Proceeding |
fullrecord | <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_4960523</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>4960523</ieee_id><sourcerecordid>4960523</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-4779ae3448dc940dc601f346bbd5874500065c6e3bd885bfd8538ad7d7ce27b3</originalsourceid><addsrcrecordid>eNpVjzlPw0AUhJdLwgr-BWlc0K7Z4709ShRxSZFAcgq6aNf7jBaIE9mmgF-PJdIwzRQjzXzD2FKKWkrhb55Wt03zUishfA3eCFT6hJXeOgkKQGkEPGWF0tZz6cXr2b9Mu3NWSFSCGwn-kpXj-C5mAWoJWLDr5kDhg4Yq5TDknzDlfV_lvtoRTbl_q8JXyvsrdtGFz5HKoy_Y5v5us3rk6-eHmW7Ns7Q4cbDWB9IALrUeRGqNkJ0GE2NCZwHnWYOtIR2Tcxi75Ga8kGyyLSkb9YIt_2ozEW0PQ96F4Xt7vKx_AZhHRKM</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Speaker diarization in meeting audio</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Nwe, T.L. ; Hanwu Sun ; Haizhou Li ; Rahardja, S.</creator><creatorcontrib>Nwe, T.L. ; Hanwu Sun ; Haizhou Li ; Rahardja, S.</creatorcontrib><description>This paper describes speaker diarization system on a NIST Rich Transcription 2007 (RT-07) meeting recognition evaluation data set for the task of multiple distant microphone (MDM). Our implementation includes three components: initial clustering, non-speech removal and cluster purification. Initial clusters are generated using directional of arrival (DOA) information and bootstrap clustering. Multiple GMM modeling for speech/non-speech classification is employed for non-speech removal component. In addition, a novel system fusion strategy using information from receiver operating curve (ROC) is proposed for non-speech removal component. Finally, consensus clustering approach together with iterative GMM clustering method is employed for speaker cluster purification. The system achieves the overall DER of 10.81%.</description><identifier>ISSN: 1520-6149</identifier><identifier>ISBN: 9781424423538</identifier><identifier>ISBN: 1424423538</identifier><identifier>EISSN: 2379-190X</identifier><identifier>EISBN: 9781424423545</identifier><identifier>EISBN: 1424423546</identifier><identifier>DOI: 10.1109/ICASSP.2009.4960523</identifier><language>eng</language><publisher>IEEE</publisher><subject>Adaptive filters ; clustering methods ; Conferences ; Direction of arrival estimation ; Erbium ; Machine learning ; Meetings ; modeling ; Natural languages ; pattern classification ; Purification ; Speech processing ; Sun ; Tin</subject><ispartof>2009 IEEE International Conference on Acoustics, Speech and Signal Processing, 2009, p.4073-4076</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/4960523$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,777,781,786,787,2052,27906,54901</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/4960523$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Nwe, T.L.</creatorcontrib><creatorcontrib>Hanwu Sun</creatorcontrib><creatorcontrib>Haizhou Li</creatorcontrib><creatorcontrib>Rahardja, S.</creatorcontrib><title>Speaker diarization in meeting audio</title><title>2009 IEEE International Conference on Acoustics, Speech and Signal Processing</title><addtitle>ICASSP</addtitle><description>This paper describes speaker diarization system on a NIST Rich Transcription 2007 (RT-07) meeting recognition evaluation data set for the task of multiple distant microphone (MDM). Our implementation includes three components: initial clustering, non-speech removal and cluster purification. Initial clusters are generated using directional of arrival (DOA) information and bootstrap clustering. Multiple GMM modeling for speech/non-speech classification is employed for non-speech removal component. In addition, a novel system fusion strategy using information from receiver operating curve (ROC) is proposed for non-speech removal component. Finally, consensus clustering approach together with iterative GMM clustering method is employed for speaker cluster purification. The system achieves the overall DER of 10.81%.</description><subject>Adaptive filters</subject><subject>clustering methods</subject><subject>Conferences</subject><subject>Direction of arrival estimation</subject><subject>Erbium</subject><subject>Machine learning</subject><subject>Meetings</subject><subject>modeling</subject><subject>Natural languages</subject><subject>pattern classification</subject><subject>Purification</subject><subject>Speech processing</subject><subject>Sun</subject><subject>Tin</subject><issn>1520-6149</issn><issn>2379-190X</issn><isbn>9781424423538</isbn><isbn>1424423538</isbn><isbn>9781424423545</isbn><isbn>1424423546</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2009</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNpVjzlPw0AUhJdLwgr-BWlc0K7Z4709ShRxSZFAcgq6aNf7jBaIE9mmgF-PJdIwzRQjzXzD2FKKWkrhb55Wt03zUishfA3eCFT6hJXeOgkKQGkEPGWF0tZz6cXr2b9Mu3NWSFSCGwn-kpXj-C5mAWoJWLDr5kDhg4Yq5TDknzDlfV_lvtoRTbl_q8JXyvsrdtGFz5HKoy_Y5v5us3rk6-eHmW7Ns7Q4cbDWB9IALrUeRGqNkJ0GE2NCZwHnWYOtIR2Tcxi75Ga8kGyyLSkb9YIt_2ozEW0PQ96F4Xt7vKx_AZhHRKM</recordid><startdate>200904</startdate><enddate>200904</enddate><creator>Nwe, T.L.</creator><creator>Hanwu Sun</creator><creator>Haizhou Li</creator><creator>Rahardja, S.</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>200904</creationdate><title>Speaker diarization in meeting audio</title><author>Nwe, T.L. ; Hanwu Sun ; Haizhou Li ; Rahardja, S.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-4779ae3448dc940dc601f346bbd5874500065c6e3bd885bfd8538ad7d7ce27b3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2009</creationdate><topic>Adaptive filters</topic><topic>clustering methods</topic><topic>Conferences</topic><topic>Direction of arrival estimation</topic><topic>Erbium</topic><topic>Machine learning</topic><topic>Meetings</topic><topic>modeling</topic><topic>Natural languages</topic><topic>pattern classification</topic><topic>Purification</topic><topic>Speech processing</topic><topic>Sun</topic><topic>Tin</topic><toplevel>online_resources</toplevel><creatorcontrib>Nwe, T.L.</creatorcontrib><creatorcontrib>Hanwu Sun</creatorcontrib><creatorcontrib>Haizhou Li</creatorcontrib><creatorcontrib>Rahardja, S.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Nwe, T.L.</au><au>Hanwu Sun</au><au>Haizhou Li</au><au>Rahardja, S.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Speaker diarization in meeting audio</atitle><btitle>2009 IEEE International Conference on Acoustics, Speech and Signal Processing</btitle><stitle>ICASSP</stitle><date>2009-04</date><risdate>2009</risdate><spage>4073</spage><epage>4076</epage><pages>4073-4076</pages><issn>1520-6149</issn><eissn>2379-190X</eissn><isbn>9781424423538</isbn><isbn>1424423538</isbn><eisbn>9781424423545</eisbn><eisbn>1424423546</eisbn><abstract>This paper describes speaker diarization system on a NIST Rich Transcription 2007 (RT-07) meeting recognition evaluation data set for the task of multiple distant microphone (MDM). Our implementation includes three components: initial clustering, non-speech removal and cluster purification. Initial clusters are generated using directional of arrival (DOA) information and bootstrap clustering. Multiple GMM modeling for speech/non-speech classification is employed for non-speech removal component. In addition, a novel system fusion strategy using information from receiver operating curve (ROC) is proposed for non-speech removal component. Finally, consensus clustering approach together with iterative GMM clustering method is employed for speaker cluster purification. The system achieves the overall DER of 10.81%.</abstract><pub>IEEE</pub><doi>10.1109/ICASSP.2009.4960523</doi><tpages>4</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1520-6149 |
ispartof | 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, 2009, p.4073-4076 |
issn | 1520-6149 2379-190X |
language | eng |
recordid | cdi_ieee_primary_4960523 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | Adaptive filters clustering methods Conferences Direction of arrival estimation Erbium Machine learning Meetings modeling Natural languages pattern classification Purification Speech processing Sun Tin |
title | Speaker diarization in meeting audio |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-19T23%3A51%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Speaker%20diarization%20in%20meeting%20audio&rft.btitle=2009%20IEEE%20International%20Conference%20on%20Acoustics,%20Speech%20and%20Signal%20Processing&rft.au=Nwe,%20T.L.&rft.date=2009-04&rft.spage=4073&rft.epage=4076&rft.pages=4073-4076&rft.issn=1520-6149&rft.eissn=2379-190X&rft.isbn=9781424423538&rft.isbn_list=1424423538&rft_id=info:doi/10.1109/ICASSP.2009.4960523&rft_dat=%3Cieee_6IE%3E4960523%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9781424423545&rft.eisbn_list=1424423546&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=4960523&rfr_iscdi=true |