Diarization of Telephone Conversations Using Factor Analysis

We report on work on speaker diarization of telephone conversations which was begun at the Robust Speaker Recognition Workshop held at Johns Hopkins University in 2008. Three diarization systems were developed and experiments were conducted using the summed-channel telephone data from the 2008 NIST...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE journal of selected topics in signal processing 2010-12, Vol.4 (6), p.1059-1070
Hauptverfasser:	Kenny, Patrick, Reynolds, Douglas, Castaldo, Fabio
Format:	Artikel
Sprache:	eng
Schlagworte:	Adaptation model Agglomeration Bayesian analysis Bayesian methods Channel factors Clustering Clustering methods Conversation diarization Errors Guidelines Hidden Markov models Reduction speaker factors Speaker recognition speaker segmentation Speech Speech recognition Studies Telephones variational Bayes
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1070
container_issue	6
container_start_page	1059
container_title	IEEE journal of selected topics in signal processing
container_volume	4
creator	Kenny, Patrick Reynolds, Douglas Castaldo, Fabio
description	We report on work on speaker diarization of telephone conversations which was begun at the Robust Speaker Recognition Workshop held at Johns Hopkins University in 2008. Three diarization systems were developed and experiments were conducted using the summed-channel telephone data from the 2008 NIST speaker recognition evaluation. The systems are a Baseline agglomerative clustering system, a Streaming system which uses speaker factors for speaker change point detection and traditional methods for speaker clustering, and a Variational Bayes system designed to exploit a large number of speaker factors as in state of the art speaker recognition systems. The Variational Bayes system proved to be the most effective, achieving a diarization error rate of 1.0% on the summed-channel data. This represents an 85% reduction in errors compared with the Baseline agglomerative clustering system. An interesting aspect of the Variational Bayes approach is that it implicitly performs speaker clustering in a way which avoids making premature hard decisions. This type of soft speaker clustering can be incorporated into other diarization systems (although causality has to be sacrificed in the case of the Streaming system). With this modification, the Baseline system achieved a diarization error rate of 3.5% (a 50% reduction in errors).
doi_str_mv	10.1109/JSTSP.2010.2081790
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_JSTSP_2010_2081790</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5587872</ieee_id><sourcerecordid>849487478</sourcerecordid><originalsourceid>FETCH-LOGICAL-c327t-fdb16b7fcf75f983dfd7c3cb135f761d4cc895f57b6d81eb5020b24e27e42db13</originalsourceid><addsrcrecordid>eNpdkE1LAzEQhoMoWKt_QC8LHjxtzecmAS-ltX5QUGh7DrvZRFO2m5pshfrrTT_w4GlmmOcdhgeAawQHCEF5_zqbz94HGKYZQ4G4hCeghyRFOaSCnu56gnPKGDkHFzEuIWS8QLQHHsauDO6n7JxvM2-zuWnM-tO3Jhv59tuEuN_EbBFd-5FNSt35kA3bstlGFy_BmS2baK6OtQ8Wk8f56Dmfvj29jIbTXBPMu9zWFSoqbrXlzEpBaltzTXSFCLPpi5pqLSSzjFdFLZCpGMSwwtRgbihOWdIHd4e76-C_NiZ2auWiNk1TtsZvohJUUsEpF4m8_Ucu_Sakd6NCkEDECklkovCB0sHHGIxV6-BWZdgmSO18qr1PtfOpjj5T6OYQcsaYvwBjgguOyS9o0nFv</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1030156939</pqid></control><display><type>article</type><title>Diarization of Telephone Conversations Using Factor Analysis</title><source>IEEE Electronic Library (IEL)</source><creator>Kenny, Patrick ; Reynolds, Douglas ; Castaldo, Fabio</creator><creatorcontrib>Kenny, Patrick ; Reynolds, Douglas ; Castaldo, Fabio</creatorcontrib><description>We report on work on speaker diarization of telephone conversations which was begun at the Robust Speaker Recognition Workshop held at Johns Hopkins University in 2008. Three diarization systems were developed and experiments were conducted using the summed-channel telephone data from the 2008 NIST speaker recognition evaluation. The systems are a Baseline agglomerative clustering system, a Streaming system which uses speaker factors for speaker change point detection and traditional methods for speaker clustering, and a Variational Bayes system designed to exploit a large number of speaker factors as in state of the art speaker recognition systems. The Variational Bayes system proved to be the most effective, achieving a diarization error rate of 1.0% on the summed-channel data. This represents an 85% reduction in errors compared with the Baseline agglomerative clustering system. An interesting aspect of the Variational Bayes approach is that it implicitly performs speaker clustering in a way which avoids making premature hard decisions. This type of soft speaker clustering can be incorporated into other diarization systems (although causality has to be sacrificed in the case of the Streaming system). With this modification, the Baseline system achieved a diarization error rate of 3.5% (a 50% reduction in errors).</description><identifier>ISSN: 1932-4553</identifier><identifier>EISSN: 1941-0484</identifier><identifier>DOI: 10.1109/JSTSP.2010.2081790</identifier><identifier>CODEN: IJSTGY</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Adaptation model ; Agglomeration ; Bayesian analysis ; Bayesian methods ; Channel factors ; Clustering ; Clustering methods ; Conversation ; diarization ; Errors ; Guidelines ; Hidden Markov models ; Reduction ; speaker factors ; Speaker recognition ; speaker segmentation ; Speech ; Speech recognition ; Studies ; Telephones ; variational Bayes</subject><ispartof>IEEE journal of selected topics in signal processing, 2010-12, Vol.4 (6), p.1059-1070</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Dec 2010</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c327t-fdb16b7fcf75f983dfd7c3cb135f761d4cc895f57b6d81eb5020b24e27e42db13</citedby><cites>FETCH-LOGICAL-c327t-fdb16b7fcf75f983dfd7c3cb135f761d4cc895f57b6d81eb5020b24e27e42db13</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5587872$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5587872$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Kenny, Patrick</creatorcontrib><creatorcontrib>Reynolds, Douglas</creatorcontrib><creatorcontrib>Castaldo, Fabio</creatorcontrib><title>Diarization of Telephone Conversations Using Factor Analysis</title><title>IEEE journal of selected topics in signal processing</title><addtitle>JSTSP</addtitle><description>We report on work on speaker diarization of telephone conversations which was begun at the Robust Speaker Recognition Workshop held at Johns Hopkins University in 2008. Three diarization systems were developed and experiments were conducted using the summed-channel telephone data from the 2008 NIST speaker recognition evaluation. The systems are a Baseline agglomerative clustering system, a Streaming system which uses speaker factors for speaker change point detection and traditional methods for speaker clustering, and a Variational Bayes system designed to exploit a large number of speaker factors as in state of the art speaker recognition systems. The Variational Bayes system proved to be the most effective, achieving a diarization error rate of 1.0% on the summed-channel data. This represents an 85% reduction in errors compared with the Baseline agglomerative clustering system. An interesting aspect of the Variational Bayes approach is that it implicitly performs speaker clustering in a way which avoids making premature hard decisions. This type of soft speaker clustering can be incorporated into other diarization systems (although causality has to be sacrificed in the case of the Streaming system). With this modification, the Baseline system achieved a diarization error rate of 3.5% (a 50% reduction in errors).</description><subject>Adaptation model</subject><subject>Agglomeration</subject><subject>Bayesian analysis</subject><subject>Bayesian methods</subject><subject>Channel factors</subject><subject>Clustering</subject><subject>Clustering methods</subject><subject>Conversation</subject><subject>diarization</subject><subject>Errors</subject><subject>Guidelines</subject><subject>Hidden Markov models</subject><subject>Reduction</subject><subject>speaker factors</subject><subject>Speaker recognition</subject><subject>speaker segmentation</subject><subject>Speech</subject><subject>Speech recognition</subject><subject>Studies</subject><subject>Telephones</subject><subject>variational Bayes</subject><issn>1932-4553</issn><issn>1941-0484</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2010</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkE1LAzEQhoMoWKt_QC8LHjxtzecmAS-ltX5QUGh7DrvZRFO2m5pshfrrTT_w4GlmmOcdhgeAawQHCEF5_zqbz94HGKYZQ4G4hCeghyRFOaSCnu56gnPKGDkHFzEuIWS8QLQHHsauDO6n7JxvM2-zuWnM-tO3Jhv59tuEuN_EbBFd-5FNSt35kA3bstlGFy_BmS2baK6OtQ8Wk8f56Dmfvj29jIbTXBPMu9zWFSoqbrXlzEpBaltzTXSFCLPpi5pqLSSzjFdFLZCpGMSwwtRgbihOWdIHd4e76-C_NiZ2auWiNk1TtsZvohJUUsEpF4m8_Ucu_Sakd6NCkEDECklkovCB0sHHGIxV6-BWZdgmSO18qr1PtfOpjj5T6OYQcsaYvwBjgguOyS9o0nFv</recordid><startdate>201012</startdate><enddate>201012</enddate><creator>Kenny, Patrick</creator><creator>Reynolds, Douglas</creator><creator>Castaldo, Fabio</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>H8D</scope><scope>L7M</scope></search><sort><creationdate>201012</creationdate><title>Diarization of Telephone Conversations Using Factor Analysis</title><author>Kenny, Patrick ; Reynolds, Douglas ; Castaldo, Fabio</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c327t-fdb16b7fcf75f983dfd7c3cb135f761d4cc895f57b6d81eb5020b24e27e42db13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Adaptation model</topic><topic>Agglomeration</topic><topic>Bayesian analysis</topic><topic>Bayesian methods</topic><topic>Channel factors</topic><topic>Clustering</topic><topic>Clustering methods</topic><topic>Conversation</topic><topic>diarization</topic><topic>Errors</topic><topic>Guidelines</topic><topic>Hidden Markov models</topic><topic>Reduction</topic><topic>speaker factors</topic><topic>Speaker recognition</topic><topic>speaker segmentation</topic><topic>Speech</topic><topic>Speech recognition</topic><topic>Studies</topic><topic>Telephones</topic><topic>variational Bayes</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kenny, Patrick</creatorcontrib><creatorcontrib>Reynolds, Douglas</creatorcontrib><creatorcontrib>Castaldo, Fabio</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>Aerospace Database</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>IEEE journal of selected topics in signal processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Kenny, Patrick</au><au>Reynolds, Douglas</au><au>Castaldo, Fabio</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Diarization of Telephone Conversations Using Factor Analysis</atitle><jtitle>IEEE journal of selected topics in signal processing</jtitle><stitle>JSTSP</stitle><date>2010-12</date><risdate>2010</risdate><volume>4</volume><issue>6</issue><spage>1059</spage><epage>1070</epage><pages>1059-1070</pages><issn>1932-4553</issn><eissn>1941-0484</eissn><coden>IJSTGY</coden><abstract>We report on work on speaker diarization of telephone conversations which was begun at the Robust Speaker Recognition Workshop held at Johns Hopkins University in 2008. Three diarization systems were developed and experiments were conducted using the summed-channel telephone data from the 2008 NIST speaker recognition evaluation. The systems are a Baseline agglomerative clustering system, a Streaming system which uses speaker factors for speaker change point detection and traditional methods for speaker clustering, and a Variational Bayes system designed to exploit a large number of speaker factors as in state of the art speaker recognition systems. The Variational Bayes system proved to be the most effective, achieving a diarization error rate of 1.0% on the summed-channel data. This represents an 85% reduction in errors compared with the Baseline agglomerative clustering system. An interesting aspect of the Variational Bayes approach is that it implicitly performs speaker clustering in a way which avoids making premature hard decisions. This type of soft speaker clustering can be incorporated into other diarization systems (although causality has to be sacrificed in the case of the Streaming system). With this modification, the Baseline system achieved a diarization error rate of 3.5% (a 50% reduction in errors).</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/JSTSP.2010.2081790</doi><tpages>12</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1932-4553
ispartof	IEEE journal of selected topics in signal processing, 2010-12, Vol.4 (6), p.1059-1070
issn	1932-4553 1941-0484
language	eng
recordid	cdi_crossref_primary_10_1109_JSTSP_2010_2081790
source	IEEE Electronic Library (IEL)
subjects	Adaptation model Agglomeration Bayesian analysis Bayesian methods Channel factors Clustering Clustering methods Conversation diarization Errors Guidelines Hidden Markov models Reduction speaker factors Speaker recognition speaker segmentation Speech Speech recognition Studies Telephones variational Bayes
title	Diarization of Telephone Conversations Using Factor Analysis
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T04%3A02%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Diarization%20of%20Telephone%20Conversations%20Using%20Factor%20Analysis&rft.jtitle=IEEE%20journal%20of%20selected%20topics%20in%20signal%20processing&rft.au=Kenny,%20Patrick&rft.date=2010-12&rft.volume=4&rft.issue=6&rft.spage=1059&rft.epage=1070&rft.pages=1059-1070&rft.issn=1932-4553&rft.eissn=1941-0484&rft.coden=IJSTGY&rft_id=info:doi/10.1109/JSTSP.2010.2081790&rft_dat=%3Cproquest_RIE%3E849487478%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1030156939&rft_id=info:pmid/&rft_ieee_id=5587872&rfr_iscdi=true