Online Speaker Clustering Using Incremental Learning of an Ergodic Hidden Markov Model

A novel online speaker clustering method based on a generative model is proposed. It employs an incremental variant of variational Bayesian learning and provides probabilistic (non-deterministic) decisions for each input utterance, on the basis of the history of preceding utterances. It can be expec...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEICE Transactions on Information and Systems 2012/10/01, Vol.E95.D(10), pp.2469-2478
Hauptverfasser: KOSHINAKA, Takafumi, NAGATOMO, Kentaro, SHINODA, Koichi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 2478
container_issue 10
container_start_page 2469
container_title IEICE Transactions on Information and Systems
container_volume E95.D
creator KOSHINAKA, Takafumi
NAGATOMO, Kentaro
SHINODA, Koichi
description A novel online speaker clustering method based on a generative model is proposed. It employs an incremental variant of variational Bayesian learning and provides probabilistic (non-deterministic) decisions for each input utterance, on the basis of the history of preceding utterances. It can be expected to be robust against errors in cluster estimation and the classification of utterances, and hence to be applicable to many real-time applications. Experimental results show that it produces 50% fewer classification errors than does a conventional online method. They also show that it is possible to reduce the number of speech recognition errors by combining the method with unsupervised speaker adaptation.
doi_str_mv 10.1587/transinf.E95.D.2469
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1221893669</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1221893669</sourcerecordid><originalsourceid>FETCH-LOGICAL-c541t-2d4c190563793bfb207fd5b599c3fbe924f03635ab76c1f0696dbb48988f987a3</originalsourceid><addsrcrecordid>eNpdkMtKAzEUhoMoWC9P4CYbwc3UXCaZyVJqvVFx4WUbMpmTGk0zNZkKvr1TqhXcnAOH7z8_fAidUDKmoq7O-2Ri9tGNp0qML8eslGoHjWhVioJySXfRiCgqi1pwto8Ocn4jhNaMihF6eYjBR8CPSzDvkPAkrHIPycc5fs7reRttggXE3gQ8A5Pi-tg5bCKepnnXeotvfNtCxPcmvXef-L5rIRyhPWdChuOffYier6ZPk5ti9nB9O7mYFVaUtC9YW1qqiJC8UrxxDSOVa0UjlLLcNaBY6QiXXJimkpY6IpVsm6asVV07VVeGH6Kzzd9l6j5WkHu98NlCCCZCt8qaMkZrxaVUA8o3qE1dzgmcXia_MOlLU6LXFvWvRT1Y1Jd6bXFInf4UmGxNcANifd5GmSzpgJGBu9twb7k3c9gCJvXeBvj_e6j8K9lC9tUkDZF_A4Sqj9M</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1221893669</pqid></control><display><type>article</type><title>Online Speaker Clustering Using Incremental Learning of an Ergodic Hidden Markov Model</title><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>J-STAGE (Japan Science &amp; Technology Information Aggregator, Electronic) Freely Available Titles - Japanese</source><creator>KOSHINAKA, Takafumi ; NAGATOMO, Kentaro ; SHINODA, Koichi</creator><creatorcontrib>KOSHINAKA, Takafumi ; NAGATOMO, Kentaro ; SHINODA, Koichi</creatorcontrib><description>A novel online speaker clustering method based on a generative model is proposed. It employs an incremental variant of variational Bayesian learning and provides probabilistic (non-deterministic) decisions for each input utterance, on the basis of the history of preceding utterances. It can be expected to be robust against errors in cluster estimation and the classification of utterances, and hence to be applicable to many real-time applications. Experimental results show that it produces 50% fewer classification errors than does a conventional online method. They also show that it is possible to reduce the number of speech recognition errors by combining the method with unsupervised speaker adaptation.</description><identifier>ISSN: 0916-8532</identifier><identifier>EISSN: 1745-1361</identifier><identifier>DOI: 10.1587/transinf.E95.D.2469</identifier><language>eng</language><publisher>Oxford: The Institute of Electronics, Information and Communication Engineers</publisher><subject>Applied sciences ; Artificial intelligence ; Classification ; Clustering ; Clusters ; Computer science; control theory; systems ; Errors ; Exact sciences and technology ; HMM ; Information, signal and communications theory ; Learning ; meeting recognition ; model selection ; On-line systems ; Online ; Real time ; Signal and communications theory ; Signal processing ; Signal representation. Spectral analysis ; Signal, noise ; Speech and sound recognition and synthesis. Linguistics ; Speech processing ; Telecommunications and information theory ; variational Bayesian learning</subject><ispartof>IEICE Transactions on Information and Systems, 2012/10/01, Vol.E95.D(10), pp.2469-2478</ispartof><rights>2012 The Institute of Electronics, Information and Communication Engineers</rights><rights>2015 INIST-CNRS</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c541t-2d4c190563793bfb207fd5b599c3fbe924f03635ab76c1f0696dbb48988f987a3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,1882,4023,27922,27923,27924</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=26414690$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>KOSHINAKA, Takafumi</creatorcontrib><creatorcontrib>NAGATOMO, Kentaro</creatorcontrib><creatorcontrib>SHINODA, Koichi</creatorcontrib><title>Online Speaker Clustering Using Incremental Learning of an Ergodic Hidden Markov Model</title><title>IEICE Transactions on Information and Systems</title><addtitle>IEICE Trans. Inf. &amp; Syst.</addtitle><description>A novel online speaker clustering method based on a generative model is proposed. It employs an incremental variant of variational Bayesian learning and provides probabilistic (non-deterministic) decisions for each input utterance, on the basis of the history of preceding utterances. It can be expected to be robust against errors in cluster estimation and the classification of utterances, and hence to be applicable to many real-time applications. Experimental results show that it produces 50% fewer classification errors than does a conventional online method. They also show that it is possible to reduce the number of speech recognition errors by combining the method with unsupervised speaker adaptation.</description><subject>Applied sciences</subject><subject>Artificial intelligence</subject><subject>Classification</subject><subject>Clustering</subject><subject>Clusters</subject><subject>Computer science; control theory; systems</subject><subject>Errors</subject><subject>Exact sciences and technology</subject><subject>HMM</subject><subject>Information, signal and communications theory</subject><subject>Learning</subject><subject>meeting recognition</subject><subject>model selection</subject><subject>On-line systems</subject><subject>Online</subject><subject>Real time</subject><subject>Signal and communications theory</subject><subject>Signal processing</subject><subject>Signal representation. Spectral analysis</subject><subject>Signal, noise</subject><subject>Speech and sound recognition and synthesis. Linguistics</subject><subject>Speech processing</subject><subject>Telecommunications and information theory</subject><subject>variational Bayesian learning</subject><issn>0916-8532</issn><issn>1745-1361</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><recordid>eNpdkMtKAzEUhoMoWC9P4CYbwc3UXCaZyVJqvVFx4WUbMpmTGk0zNZkKvr1TqhXcnAOH7z8_fAidUDKmoq7O-2Ri9tGNp0qML8eslGoHjWhVioJySXfRiCgqi1pwto8Ocn4jhNaMihF6eYjBR8CPSzDvkPAkrHIPycc5fs7reRttggXE3gQ8A5Pi-tg5bCKepnnXeotvfNtCxPcmvXef-L5rIRyhPWdChuOffYier6ZPk5ti9nB9O7mYFVaUtC9YW1qqiJC8UrxxDSOVa0UjlLLcNaBY6QiXXJimkpY6IpVsm6asVV07VVeGH6Kzzd9l6j5WkHu98NlCCCZCt8qaMkZrxaVUA8o3qE1dzgmcXia_MOlLU6LXFvWvRT1Y1Jd6bXFInf4UmGxNcANifd5GmSzpgJGBu9twb7k3c9gCJvXeBvj_e6j8K9lC9tUkDZF_A4Sqj9M</recordid><startdate>2012</startdate><enddate>2012</enddate><creator>KOSHINAKA, Takafumi</creator><creator>NAGATOMO, Kentaro</creator><creator>SHINODA, Koichi</creator><general>The Institute of Electronics, Information and Communication Engineers</general><general>Oxford University Press</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>2012</creationdate><title>Online Speaker Clustering Using Incremental Learning of an Ergodic Hidden Markov Model</title><author>KOSHINAKA, Takafumi ; NAGATOMO, Kentaro ; SHINODA, Koichi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c541t-2d4c190563793bfb207fd5b599c3fbe924f03635ab76c1f0696dbb48988f987a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Applied sciences</topic><topic>Artificial intelligence</topic><topic>Classification</topic><topic>Clustering</topic><topic>Clusters</topic><topic>Computer science; control theory; systems</topic><topic>Errors</topic><topic>Exact sciences and technology</topic><topic>HMM</topic><topic>Information, signal and communications theory</topic><topic>Learning</topic><topic>meeting recognition</topic><topic>model selection</topic><topic>On-line systems</topic><topic>Online</topic><topic>Real time</topic><topic>Signal and communications theory</topic><topic>Signal processing</topic><topic>Signal representation. Spectral analysis</topic><topic>Signal, noise</topic><topic>Speech and sound recognition and synthesis. Linguistics</topic><topic>Speech processing</topic><topic>Telecommunications and information theory</topic><topic>variational Bayesian learning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>KOSHINAKA, Takafumi</creatorcontrib><creatorcontrib>NAGATOMO, Kentaro</creatorcontrib><creatorcontrib>SHINODA, Koichi</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEICE Transactions on Information and Systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>KOSHINAKA, Takafumi</au><au>NAGATOMO, Kentaro</au><au>SHINODA, Koichi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Online Speaker Clustering Using Incremental Learning of an Ergodic Hidden Markov Model</atitle><jtitle>IEICE Transactions on Information and Systems</jtitle><addtitle>IEICE Trans. Inf. &amp; Syst.</addtitle><date>2012</date><risdate>2012</risdate><volume>E95.D</volume><issue>10</issue><spage>2469</spage><epage>2478</epage><pages>2469-2478</pages><issn>0916-8532</issn><eissn>1745-1361</eissn><abstract>A novel online speaker clustering method based on a generative model is proposed. It employs an incremental variant of variational Bayesian learning and provides probabilistic (non-deterministic) decisions for each input utterance, on the basis of the history of preceding utterances. It can be expected to be robust against errors in cluster estimation and the classification of utterances, and hence to be applicable to many real-time applications. Experimental results show that it produces 50% fewer classification errors than does a conventional online method. They also show that it is possible to reduce the number of speech recognition errors by combining the method with unsupervised speaker adaptation.</abstract><cop>Oxford</cop><pub>The Institute of Electronics, Information and Communication Engineers</pub><doi>10.1587/transinf.E95.D.2469</doi><tpages>10</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0916-8532
ispartof IEICE Transactions on Information and Systems, 2012/10/01, Vol.E95.D(10), pp.2469-2478
issn 0916-8532
1745-1361
language eng
recordid cdi_proquest_miscellaneous_1221893669
source Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; J-STAGE (Japan Science & Technology Information Aggregator, Electronic) Freely Available Titles - Japanese
subjects Applied sciences
Artificial intelligence
Classification
Clustering
Clusters
Computer science
control theory
systems
Errors
Exact sciences and technology
HMM
Information, signal and communications theory
Learning
meeting recognition
model selection
On-line systems
Online
Real time
Signal and communications theory
Signal processing
Signal representation. Spectral analysis
Signal, noise
Speech and sound recognition and synthesis. Linguistics
Speech processing
Telecommunications and information theory
variational Bayesian learning
title Online Speaker Clustering Using Incremental Learning of an Ergodic Hidden Markov Model
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T17%3A29%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Online%20Speaker%20Clustering%20Using%20Incremental%20Learning%20of%20an%20Ergodic%20Hidden%20Markov%20Model&rft.jtitle=IEICE%20Transactions%20on%20Information%20and%20Systems&rft.au=KOSHINAKA,%20Takafumi&rft.date=2012&rft.volume=E95.D&rft.issue=10&rft.spage=2469&rft.epage=2478&rft.pages=2469-2478&rft.issn=0916-8532&rft.eissn=1745-1361&rft_id=info:doi/10.1587/transinf.E95.D.2469&rft_dat=%3Cproquest_cross%3E1221893669%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1221893669&rft_id=info:pmid/&rfr_iscdi=true