Speaker diarization in meeting audio

This paper describes speaker diarization system on a NIST Rich Transcription 2007 (RT-07) meeting recognition evaluation data set for the task of multiple distant microphone (MDM). Our implementation includes three components: initial clustering, non-speech removal and cluster purification. Initial...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Nwe, T.L., Hanwu Sun, Haizhou Li, Rahardja, S.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 4076
container_issue
container_start_page 4073
container_title
container_volume
creator Nwe, T.L.
Hanwu Sun
Haizhou Li
Rahardja, S.
description This paper describes speaker diarization system on a NIST Rich Transcription 2007 (RT-07) meeting recognition evaluation data set for the task of multiple distant microphone (MDM). Our implementation includes three components: initial clustering, non-speech removal and cluster purification. Initial clusters are generated using directional of arrival (DOA) information and bootstrap clustering. Multiple GMM modeling for speech/non-speech classification is employed for non-speech removal component. In addition, a novel system fusion strategy using information from receiver operating curve (ROC) is proposed for non-speech removal component. Finally, consensus clustering approach together with iterative GMM clustering method is employed for speaker cluster purification. The system achieves the overall DER of 10.81%.
doi_str_mv 10.1109/ICASSP.2009.4960523
format Conference Proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_4960523</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>4960523</ieee_id><sourcerecordid>4960523</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-4779ae3448dc940dc601f346bbd5874500065c6e3bd885bfd8538ad7d7ce27b3</originalsourceid><addsrcrecordid>eNpVjzlPw0AUhJdLwgr-BWlc0K7Z4709ShRxSZFAcgq6aNf7jBaIE9mmgF-PJdIwzRQjzXzD2FKKWkrhb55Wt03zUishfA3eCFT6hJXeOgkKQGkEPGWF0tZz6cXr2b9Mu3NWSFSCGwn-kpXj-C5mAWoJWLDr5kDhg4Yq5TDknzDlfV_lvtoRTbl_q8JXyvsrdtGFz5HKoy_Y5v5us3rk6-eHmW7Ns7Q4cbDWB9IALrUeRGqNkJ0GE2NCZwHnWYOtIR2Tcxi75Ga8kGyyLSkb9YIt_2ozEW0PQ96F4Xt7vKx_AZhHRKM</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Speaker diarization in meeting audio</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Nwe, T.L. ; Hanwu Sun ; Haizhou Li ; Rahardja, S.</creator><creatorcontrib>Nwe, T.L. ; Hanwu Sun ; Haizhou Li ; Rahardja, S.</creatorcontrib><description>This paper describes speaker diarization system on a NIST Rich Transcription 2007 (RT-07) meeting recognition evaluation data set for the task of multiple distant microphone (MDM). Our implementation includes three components: initial clustering, non-speech removal and cluster purification. Initial clusters are generated using directional of arrival (DOA) information and bootstrap clustering. Multiple GMM modeling for speech/non-speech classification is employed for non-speech removal component. In addition, a novel system fusion strategy using information from receiver operating curve (ROC) is proposed for non-speech removal component. Finally, consensus clustering approach together with iterative GMM clustering method is employed for speaker cluster purification. The system achieves the overall DER of 10.81%.</description><identifier>ISSN: 1520-6149</identifier><identifier>ISBN: 9781424423538</identifier><identifier>ISBN: 1424423538</identifier><identifier>EISSN: 2379-190X</identifier><identifier>EISBN: 9781424423545</identifier><identifier>EISBN: 1424423546</identifier><identifier>DOI: 10.1109/ICASSP.2009.4960523</identifier><language>eng</language><publisher>IEEE</publisher><subject>Adaptive filters ; clustering methods ; Conferences ; Direction of arrival estimation ; Erbium ; Machine learning ; Meetings ; modeling ; Natural languages ; pattern classification ; Purification ; Speech processing ; Sun ; Tin</subject><ispartof>2009 IEEE International Conference on Acoustics, Speech and Signal Processing, 2009, p.4073-4076</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/4960523$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,777,781,786,787,2052,27906,54901</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/4960523$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Nwe, T.L.</creatorcontrib><creatorcontrib>Hanwu Sun</creatorcontrib><creatorcontrib>Haizhou Li</creatorcontrib><creatorcontrib>Rahardja, S.</creatorcontrib><title>Speaker diarization in meeting audio</title><title>2009 IEEE International Conference on Acoustics, Speech and Signal Processing</title><addtitle>ICASSP</addtitle><description>This paper describes speaker diarization system on a NIST Rich Transcription 2007 (RT-07) meeting recognition evaluation data set for the task of multiple distant microphone (MDM). Our implementation includes three components: initial clustering, non-speech removal and cluster purification. Initial clusters are generated using directional of arrival (DOA) information and bootstrap clustering. Multiple GMM modeling for speech/non-speech classification is employed for non-speech removal component. In addition, a novel system fusion strategy using information from receiver operating curve (ROC) is proposed for non-speech removal component. Finally, consensus clustering approach together with iterative GMM clustering method is employed for speaker cluster purification. The system achieves the overall DER of 10.81%.</description><subject>Adaptive filters</subject><subject>clustering methods</subject><subject>Conferences</subject><subject>Direction of arrival estimation</subject><subject>Erbium</subject><subject>Machine learning</subject><subject>Meetings</subject><subject>modeling</subject><subject>Natural languages</subject><subject>pattern classification</subject><subject>Purification</subject><subject>Speech processing</subject><subject>Sun</subject><subject>Tin</subject><issn>1520-6149</issn><issn>2379-190X</issn><isbn>9781424423538</isbn><isbn>1424423538</isbn><isbn>9781424423545</isbn><isbn>1424423546</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2009</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNpVjzlPw0AUhJdLwgr-BWlc0K7Z4709ShRxSZFAcgq6aNf7jBaIE9mmgF-PJdIwzRQjzXzD2FKKWkrhb55Wt03zUishfA3eCFT6hJXeOgkKQGkEPGWF0tZz6cXr2b9Mu3NWSFSCGwn-kpXj-C5mAWoJWLDr5kDhg4Yq5TDknzDlfV_lvtoRTbl_q8JXyvsrdtGFz5HKoy_Y5v5us3rk6-eHmW7Ns7Q4cbDWB9IALrUeRGqNkJ0GE2NCZwHnWYOtIR2Tcxi75Ga8kGyyLSkb9YIt_2ozEW0PQ96F4Xt7vKx_AZhHRKM</recordid><startdate>200904</startdate><enddate>200904</enddate><creator>Nwe, T.L.</creator><creator>Hanwu Sun</creator><creator>Haizhou Li</creator><creator>Rahardja, S.</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>200904</creationdate><title>Speaker diarization in meeting audio</title><author>Nwe, T.L. ; Hanwu Sun ; Haizhou Li ; Rahardja, S.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-4779ae3448dc940dc601f346bbd5874500065c6e3bd885bfd8538ad7d7ce27b3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2009</creationdate><topic>Adaptive filters</topic><topic>clustering methods</topic><topic>Conferences</topic><topic>Direction of arrival estimation</topic><topic>Erbium</topic><topic>Machine learning</topic><topic>Meetings</topic><topic>modeling</topic><topic>Natural languages</topic><topic>pattern classification</topic><topic>Purification</topic><topic>Speech processing</topic><topic>Sun</topic><topic>Tin</topic><toplevel>online_resources</toplevel><creatorcontrib>Nwe, T.L.</creatorcontrib><creatorcontrib>Hanwu Sun</creatorcontrib><creatorcontrib>Haizhou Li</creatorcontrib><creatorcontrib>Rahardja, S.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Nwe, T.L.</au><au>Hanwu Sun</au><au>Haizhou Li</au><au>Rahardja, S.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Speaker diarization in meeting audio</atitle><btitle>2009 IEEE International Conference on Acoustics, Speech and Signal Processing</btitle><stitle>ICASSP</stitle><date>2009-04</date><risdate>2009</risdate><spage>4073</spage><epage>4076</epage><pages>4073-4076</pages><issn>1520-6149</issn><eissn>2379-190X</eissn><isbn>9781424423538</isbn><isbn>1424423538</isbn><eisbn>9781424423545</eisbn><eisbn>1424423546</eisbn><abstract>This paper describes speaker diarization system on a NIST Rich Transcription 2007 (RT-07) meeting recognition evaluation data set for the task of multiple distant microphone (MDM). Our implementation includes three components: initial clustering, non-speech removal and cluster purification. Initial clusters are generated using directional of arrival (DOA) information and bootstrap clustering. Multiple GMM modeling for speech/non-speech classification is employed for non-speech removal component. In addition, a novel system fusion strategy using information from receiver operating curve (ROC) is proposed for non-speech removal component. Finally, consensus clustering approach together with iterative GMM clustering method is employed for speaker cluster purification. The system achieves the overall DER of 10.81%.</abstract><pub>IEEE</pub><doi>10.1109/ICASSP.2009.4960523</doi><tpages>4</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1520-6149
ispartof 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, 2009, p.4073-4076
issn 1520-6149
2379-190X
language eng
recordid cdi_ieee_primary_4960523
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Adaptive filters
clustering methods
Conferences
Direction of arrival estimation
Erbium
Machine learning
Meetings
modeling
Natural languages
pattern classification
Purification
Speech processing
Sun
Tin
title Speaker diarization in meeting audio
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-19T23%3A51%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Speaker%20diarization%20in%20meeting%20audio&rft.btitle=2009%20IEEE%20International%20Conference%20on%20Acoustics,%20Speech%20and%20Signal%20Processing&rft.au=Nwe,%20T.L.&rft.date=2009-04&rft.spage=4073&rft.epage=4076&rft.pages=4073-4076&rft.issn=1520-6149&rft.eissn=2379-190X&rft.isbn=9781424423538&rft.isbn_list=1424423538&rft_id=info:doi/10.1109/ICASSP.2009.4960523&rft_dat=%3Cieee_6IE%3E4960523%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9781424423545&rft.eisbn_list=1424423546&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=4960523&rfr_iscdi=true