Short-Term Spatio-Temporal Clustering Applied to Multiple Moving Speakers

Distant microphones permit to process spontaneous multiparty speech with very little constraints on speakers, as opposed to close-talking microphones. Minimizing the constraints on speakers permits a large diversity of applications, including meeting summarization and browsing, surveillance, hearing...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on audio, speech, and language processing speech, and language processing, 2007-07, Vol.15 (5), p.1696-1710
Hauptverfasser: Lathoud, G., Odobez, J.-M.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1710
container_issue 5
container_start_page 1696
container_title IEEE transactions on audio, speech, and language processing
container_volume 15
creator Lathoud, G.
Odobez, J.-M.
description Distant microphones permit to process spontaneous multiparty speech with very little constraints on speakers, as opposed to close-talking microphones. Minimizing the constraints on speakers permits a large diversity of applications, including meeting summarization and browsing, surveillance, hearing aids, and more natural human-machine interaction. Such applications of distant microphones require to determine where and when the speakers are talking. This is inherently a multisource problem, because of background noise sources, as well as the natural tendency of multiple speakers to talk over each other. Moreover, spontaneous speech utterances are highly discontinuous, which makes it difficult to track the multiple speakers with classical filtering approaches, such as Kalman filtering of particle filters. As an alternative, this paper proposes a probabilistic framework to determine the trajectories of multiple moving speakers in the short-term only, i.e., only while they speak. Instantaneous location estimates that are close in space and time are grouped into ldquoshort-term clustersrdquo in a principled manner. Each short-term cluster determines the precise start and end times of an utterance and a short-term spatial trajectory. Contrastive experiments clearly show the benefit of using short-term clustering, on real indoor recordings with seated speakers in meetings, as well as multiple moving speakers.
doi_str_mv 10.1109/TASL.2007.896667
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TASL_2007_896667</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>4244525</ieee_id><sourcerecordid>2568814031</sourcerecordid><originalsourceid>FETCH-LOGICAL-c351t-cb2e2427786834acd574bce990749c439a063bb435c6b642764135a0acd01f883</originalsourceid><addsrcrecordid>eNpdkM9LwzAcxYsoOKd3wUsR9NaZNL-a4xj-GGx46DyHNEs1M11q0gr-96Z0TPD0ffD9vMfjJck1BDMIAX_YzMvVLAeAzQpOKWUnyQQSUmSM5_j0qCE9Ty5C2AGAEcVwkizLD-e7bKN9k5at7IyLummdlzZd2D502pv9ezpvW2v0Nu1cuu5tZ1qr07X7Hl5lq-Wn9uEyOaulDfrqcKfJ29PjZvGSrV6fl4v5KlOIwC5TVa5znDNW0AJhqbaE4UppzgHDXGHEJaCoqjAiilY0grEmIhJEEsC6KNA0uR9zW---eh060ZigtLVyr10fBKKIMAhZBG__gTvX-33sJjhkGAHGQYTACCnvQvC6Fq03jfQ_AgIxDCuGYcUwrBiHjZa7Q64MStray70y4c9XFAxBTiJ3M3JGa3184xxjkhP0C3ANf78</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>917430790</pqid></control><display><type>article</type><title>Short-Term Spatio-Temporal Clustering Applied to Multiple Moving Speakers</title><source>IEEE Electronic Library (IEL)</source><creator>Lathoud, G. ; Odobez, J.-M.</creator><creatorcontrib>Lathoud, G. ; Odobez, J.-M.</creatorcontrib><description>Distant microphones permit to process spontaneous multiparty speech with very little constraints on speakers, as opposed to close-talking microphones. Minimizing the constraints on speakers permits a large diversity of applications, including meeting summarization and browsing, surveillance, hearing aids, and more natural human-machine interaction. Such applications of distant microphones require to determine where and when the speakers are talking. This is inherently a multisource problem, because of background noise sources, as well as the natural tendency of multiple speakers to talk over each other. Moreover, spontaneous speech utterances are highly discontinuous, which makes it difficult to track the multiple speakers with classical filtering approaches, such as Kalman filtering of particle filters. As an alternative, this paper proposes a probabilistic framework to determine the trajectories of multiple moving speakers in the short-term only, i.e., only while they speak. Instantaneous location estimates that are close in space and time are grouped into ldquoshort-term clustersrdquo in a principled manner. Each short-term cluster determines the precise start and end times of an utterance and a short-term spatial trajectory. Contrastive experiments clearly show the benefit of using short-term clustering, on real indoor recordings with seated speakers in meetings, as well as multiple moving speakers.</description><identifier>ISSN: 1558-7916</identifier><identifier>ISSN: 2329-9290</identifier><identifier>EISSN: 1558-7924</identifier><identifier>EISSN: 2329-9304</identifier><identifier>DOI: 10.1109/TASL.2007.896667</identifier><identifier>CODEN: ITASD8</identifier><language>eng</language><publisher>Piscataway, NJ: IEEE</publisher><subject>Applied sciences ; Background noise ; Detection, estimation, filtering, equalization, prediction ; Exact sciences and technology ; Filtering ; Hearing aids ; Information, signal and communications theory ; Kalman filters ; Localization ; Man machine systems ; Microphones ; multiple acoustic sources ; Particle filters ; Particle tracking ; short-term clustering ; Signal and communications theory ; Signal processing ; Signal representation. Spectral analysis ; Signal, noise ; Speech processing ; speech segmentation ; Surveillance ; Telecommunications and information theory ; tracking</subject><ispartof>IEEE transactions on audio, speech, and language processing, 2007-07, Vol.15 (5), p.1696-1710</ispartof><rights>2007 INIST-CNRS</rights><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2007</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c351t-cb2e2427786834acd574bce990749c439a063bb435c6b642764135a0acd01f883</citedby><cites>FETCH-LOGICAL-c351t-cb2e2427786834acd574bce990749c439a063bb435c6b642764135a0acd01f883</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/4244525$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/4244525$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=18873195$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Lathoud, G.</creatorcontrib><creatorcontrib>Odobez, J.-M.</creatorcontrib><title>Short-Term Spatio-Temporal Clustering Applied to Multiple Moving Speakers</title><title>IEEE transactions on audio, speech, and language processing</title><addtitle>TASL</addtitle><description>Distant microphones permit to process spontaneous multiparty speech with very little constraints on speakers, as opposed to close-talking microphones. Minimizing the constraints on speakers permits a large diversity of applications, including meeting summarization and browsing, surveillance, hearing aids, and more natural human-machine interaction. Such applications of distant microphones require to determine where and when the speakers are talking. This is inherently a multisource problem, because of background noise sources, as well as the natural tendency of multiple speakers to talk over each other. Moreover, spontaneous speech utterances are highly discontinuous, which makes it difficult to track the multiple speakers with classical filtering approaches, such as Kalman filtering of particle filters. As an alternative, this paper proposes a probabilistic framework to determine the trajectories of multiple moving speakers in the short-term only, i.e., only while they speak. Instantaneous location estimates that are close in space and time are grouped into ldquoshort-term clustersrdquo in a principled manner. Each short-term cluster determines the precise start and end times of an utterance and a short-term spatial trajectory. Contrastive experiments clearly show the benefit of using short-term clustering, on real indoor recordings with seated speakers in meetings, as well as multiple moving speakers.</description><subject>Applied sciences</subject><subject>Background noise</subject><subject>Detection, estimation, filtering, equalization, prediction</subject><subject>Exact sciences and technology</subject><subject>Filtering</subject><subject>Hearing aids</subject><subject>Information, signal and communications theory</subject><subject>Kalman filters</subject><subject>Localization</subject><subject>Man machine systems</subject><subject>Microphones</subject><subject>multiple acoustic sources</subject><subject>Particle filters</subject><subject>Particle tracking</subject><subject>short-term clustering</subject><subject>Signal and communications theory</subject><subject>Signal processing</subject><subject>Signal representation. Spectral analysis</subject><subject>Signal, noise</subject><subject>Speech processing</subject><subject>speech segmentation</subject><subject>Surveillance</subject><subject>Telecommunications and information theory</subject><subject>tracking</subject><issn>1558-7916</issn><issn>2329-9290</issn><issn>1558-7924</issn><issn>2329-9304</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2007</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkM9LwzAcxYsoOKd3wUsR9NaZNL-a4xj-GGx46DyHNEs1M11q0gr-96Z0TPD0ffD9vMfjJck1BDMIAX_YzMvVLAeAzQpOKWUnyQQSUmSM5_j0qCE9Ty5C2AGAEcVwkizLD-e7bKN9k5at7IyLummdlzZd2D502pv9ezpvW2v0Nu1cuu5tZ1qr07X7Hl5lq-Wn9uEyOaulDfrqcKfJ29PjZvGSrV6fl4v5KlOIwC5TVa5znDNW0AJhqbaE4UppzgHDXGHEJaCoqjAiilY0grEmIhJEEsC6KNA0uR9zW---eh060ZigtLVyr10fBKKIMAhZBG__gTvX-33sJjhkGAHGQYTACCnvQvC6Fq03jfQ_AgIxDCuGYcUwrBiHjZa7Q64MStray70y4c9XFAxBTiJ3M3JGa3184xxjkhP0C3ANf78</recordid><startdate>20070701</startdate><enddate>20070701</enddate><creator>Lathoud, G.</creator><creator>Odobez, J.-M.</creator><general>IEEE</general><general>Institute of Electrical and Electronics Engineers</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20070701</creationdate><title>Short-Term Spatio-Temporal Clustering Applied to Multiple Moving Speakers</title><author>Lathoud, G. ; Odobez, J.-M.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c351t-cb2e2427786834acd574bce990749c439a063bb435c6b642764135a0acd01f883</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2007</creationdate><topic>Applied sciences</topic><topic>Background noise</topic><topic>Detection, estimation, filtering, equalization, prediction</topic><topic>Exact sciences and technology</topic><topic>Filtering</topic><topic>Hearing aids</topic><topic>Information, signal and communications theory</topic><topic>Kalman filters</topic><topic>Localization</topic><topic>Man machine systems</topic><topic>Microphones</topic><topic>multiple acoustic sources</topic><topic>Particle filters</topic><topic>Particle tracking</topic><topic>short-term clustering</topic><topic>Signal and communications theory</topic><topic>Signal processing</topic><topic>Signal representation. Spectral analysis</topic><topic>Signal, noise</topic><topic>Speech processing</topic><topic>speech segmentation</topic><topic>Surveillance</topic><topic>Telecommunications and information theory</topic><topic>tracking</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lathoud, G.</creatorcontrib><creatorcontrib>Odobez, J.-M.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on audio, speech, and language processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lathoud, G.</au><au>Odobez, J.-M.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Short-Term Spatio-Temporal Clustering Applied to Multiple Moving Speakers</atitle><jtitle>IEEE transactions on audio, speech, and language processing</jtitle><stitle>TASL</stitle><date>2007-07-01</date><risdate>2007</risdate><volume>15</volume><issue>5</issue><spage>1696</spage><epage>1710</epage><pages>1696-1710</pages><issn>1558-7916</issn><issn>2329-9290</issn><eissn>1558-7924</eissn><eissn>2329-9304</eissn><coden>ITASD8</coden><abstract>Distant microphones permit to process spontaneous multiparty speech with very little constraints on speakers, as opposed to close-talking microphones. Minimizing the constraints on speakers permits a large diversity of applications, including meeting summarization and browsing, surveillance, hearing aids, and more natural human-machine interaction. Such applications of distant microphones require to determine where and when the speakers are talking. This is inherently a multisource problem, because of background noise sources, as well as the natural tendency of multiple speakers to talk over each other. Moreover, spontaneous speech utterances are highly discontinuous, which makes it difficult to track the multiple speakers with classical filtering approaches, such as Kalman filtering of particle filters. As an alternative, this paper proposes a probabilistic framework to determine the trajectories of multiple moving speakers in the short-term only, i.e., only while they speak. Instantaneous location estimates that are close in space and time are grouped into ldquoshort-term clustersrdquo in a principled manner. Each short-term cluster determines the precise start and end times of an utterance and a short-term spatial trajectory. Contrastive experiments clearly show the benefit of using short-term clustering, on real indoor recordings with seated speakers in meetings, as well as multiple moving speakers.</abstract><cop>Piscataway, NJ</cop><pub>IEEE</pub><doi>10.1109/TASL.2007.896667</doi><tpages>15</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1558-7916
ispartof IEEE transactions on audio, speech, and language processing, 2007-07, Vol.15 (5), p.1696-1710
issn 1558-7916
2329-9290
1558-7924
2329-9304
language eng
recordid cdi_crossref_primary_10_1109_TASL_2007_896667
source IEEE Electronic Library (IEL)
subjects Applied sciences
Background noise
Detection, estimation, filtering, equalization, prediction
Exact sciences and technology
Filtering
Hearing aids
Information, signal and communications theory
Kalman filters
Localization
Man machine systems
Microphones
multiple acoustic sources
Particle filters
Particle tracking
short-term clustering
Signal and communications theory
Signal processing
Signal representation. Spectral analysis
Signal, noise
Speech processing
speech segmentation
Surveillance
Telecommunications and information theory
tracking
title Short-Term Spatio-Temporal Clustering Applied to Multiple Moving Speakers
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-22T19%3A37%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Short-Term%20Spatio-Temporal%20Clustering%20Applied%20to%20Multiple%20Moving%20Speakers&rft.jtitle=IEEE%20transactions%20on%20audio,%20speech,%20and%20language%20processing&rft.au=Lathoud,%20G.&rft.date=2007-07-01&rft.volume=15&rft.issue=5&rft.spage=1696&rft.epage=1710&rft.pages=1696-1710&rft.issn=1558-7916&rft.eissn=1558-7924&rft.coden=ITASD8&rft_id=info:doi/10.1109/TASL.2007.896667&rft_dat=%3Cproquest_RIE%3E2568814031%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=917430790&rft_id=info:pmid/&rft_ieee_id=4244525&rfr_iscdi=true