Low-Latency Real-Time Meeting Recognition and Understanding Using Distant Microphones and Omni-Directional Camera

This paper presents our real-time meeting analyzer for monitoring conversations in an ongoing group meeting. The goal of the system is to recognize automatically "who is speaking what" in an online manner for meeting assistance. Our system continuously captures the utterances and face pose...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on audio, speech, and language processing speech, and language processing, 2012-02, Vol.20 (2), p.499-513
Hauptverfasser: Hori, T., Araki, S., Yoshioka, T., Fujimoto, M., Watanabe, S., Oba, T., Ogawa, A., Otsuka, K., Mikami, D., Kinoshita, K., Nakatani, T., Nakamura, A., Yamato, J.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 513
container_issue 2
container_start_page 499
container_title IEEE transactions on audio, speech, and language processing
container_volume 20
creator Hori, T.
Araki, S.
Yoshioka, T.
Fujimoto, M.
Watanabe, S.
Oba, T.
Ogawa, A.
Otsuka, K.
Mikami, D.
Kinoshita, K.
Nakatani, T.
Nakamura, A.
Yamato, J.
description This paper presents our real-time meeting analyzer for monitoring conversations in an ongoing group meeting. The goal of the system is to recognize automatically "who is speaking what" in an online manner for meeting assistance. Our system continuously captures the utterances and face poses of each speaker using a microphone array and an omni-directional camera positioned at the center of the meeting table. Through a series of advanced audio processing operations, an overlapping speech signal is enhanced and the components are separated into individual speaker's channels. Then the utterances are sequentially transcribed by our speech recognizer with low latency. In parallel with speech recognition, the activity of each participant (e.g., speaking, laughing, watching someone) and the circumstances of the meeting (e.g., topic, activeness, casualness) are detected and displayed on a browser together with the transcripts. In this paper, we describe our techniques and our attempt to achieve the low-latency monitoring of meetings, and we show our experimental results for real-time meeting transcription.
doi_str_mv 10.1109/TASL.2011.2164527
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_pascalfrancis_primary_25473683</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5983476</ieee_id><sourcerecordid>2570326501</sourcerecordid><originalsourceid>FETCH-LOGICAL-c421t-d3936498a2a287495e56c31bb48c2b26c856a94478110a4cf4a48359bc90ce473</originalsourceid><addsrcrecordid>eNpdkU1Lw0AQhoMoWKs_QLwEQfCSup_J7lFavyAiaHsO0-1UV5JNu5si_fdubOnBy-4w88w7M7xJcknJiFKi76b3H-WIEUpHjOZCsuIoGVApVVZoJo4PMc1Pk7MQvgkRPBd0kKzL9icroUNntuk7Qp1NbYPpK2Jn3WfMmPbT2c62LgW3SGdugT50Meyrs9C_E9snuvTVGt-uvlqH4Y99a5zNJtaj6duhTsfQoIfz5GQJdcCL_T9MZo8P0_FzVr49vYzvy8wIRrtswXXcUCtgwFQhtESZG07nc6EMm7PcKJmDFqJQ8X4QZilAKC713GhiUBR8mNzudFe-XW8wdFVjg8G6BoftJlSUUKI0F7mK6PU_9Lvd-LhyqDQtFNeMkwjRHRSvDMHjslp524DfRqWq96DqPah6D6q9B7HnZi8MwUC99OCMDYdGJuOecX7krnacRcRDWWrFRZHzX3B4jtQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>917839230</pqid></control><display><type>article</type><title>Low-Latency Real-Time Meeting Recognition and Understanding Using Distant Microphones and Omni-Directional Camera</title><source>IEEE Electronic Library (IEL)</source><creator>Hori, T. ; Araki, S. ; Yoshioka, T. ; Fujimoto, M. ; Watanabe, S. ; Oba, T. ; Ogawa, A. ; Otsuka, K. ; Mikami, D. ; Kinoshita, K. ; Nakatani, T. ; Nakamura, A. ; Yamato, J.</creator><creatorcontrib>Hori, T. ; Araki, S. ; Yoshioka, T. ; Fujimoto, M. ; Watanabe, S. ; Oba, T. ; Ogawa, A. ; Otsuka, K. ; Mikami, D. ; Kinoshita, K. ; Nakatani, T. ; Nakamura, A. ; Yamato, J.</creatorcontrib><description>This paper presents our real-time meeting analyzer for monitoring conversations in an ongoing group meeting. The goal of the system is to recognize automatically "who is speaking what" in an online manner for meeting assistance. Our system continuously captures the utterances and face poses of each speaker using a microphone array and an omni-directional camera positioned at the center of the meeting table. Through a series of advanced audio processing operations, an overlapping speech signal is enhanced and the components are separated into individual speaker's channels. Then the utterances are sequentially transcribed by our speech recognizer with low latency. In parallel with speech recognition, the activity of each participant (e.g., speaking, laughing, watching someone) and the circumstances of the meeting (e.g., topic, activeness, casualness) are detected and displayed on a browser together with the transcripts. In this paper, we describe our techniques and our attempt to achieve the low-latency monitoring of meetings, and we show our experimental results for real-time meeting transcription.</description><identifier>ISSN: 1558-7916</identifier><identifier>ISSN: 2329-9290</identifier><identifier>EISSN: 1558-7924</identifier><identifier>EISSN: 2329-9304</identifier><identifier>DOI: 10.1109/TASL.2011.2164527</identifier><identifier>CODEN: ITASD8</identifier><language>eng</language><publisher>Piscataway, NJ: IEEE</publisher><subject>Applied sciences ; Browsers ; Cameras ; Distant microphones ; Exact sciences and technology ; Information, signal and communications theory ; meeting analysis ; Meetings ; Microphones ; Miscellaneous ; Monitoring ; Pattern recognition ; Real time ; Real-time systems ; Recognition ; Signal processing ; speaker diarization ; Speech ; speech enhancement ; Speech processing ; Speech recognition ; Studies ; Telecommunications and information theory ; topic tracking</subject><ispartof>IEEE transactions on audio, speech, and language processing, 2012-02, Vol.20 (2), p.499-513</ispartof><rights>2015 INIST-CNRS</rights><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Feb 2012</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c421t-d3936498a2a287495e56c31bb48c2b26c856a94478110a4cf4a48359bc90ce473</citedby><cites>FETCH-LOGICAL-c421t-d3936498a2a287495e56c31bb48c2b26c856a94478110a4cf4a48359bc90ce473</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5983476$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5983476$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=25473683$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Hori, T.</creatorcontrib><creatorcontrib>Araki, S.</creatorcontrib><creatorcontrib>Yoshioka, T.</creatorcontrib><creatorcontrib>Fujimoto, M.</creatorcontrib><creatorcontrib>Watanabe, S.</creatorcontrib><creatorcontrib>Oba, T.</creatorcontrib><creatorcontrib>Ogawa, A.</creatorcontrib><creatorcontrib>Otsuka, K.</creatorcontrib><creatorcontrib>Mikami, D.</creatorcontrib><creatorcontrib>Kinoshita, K.</creatorcontrib><creatorcontrib>Nakatani, T.</creatorcontrib><creatorcontrib>Nakamura, A.</creatorcontrib><creatorcontrib>Yamato, J.</creatorcontrib><title>Low-Latency Real-Time Meeting Recognition and Understanding Using Distant Microphones and Omni-Directional Camera</title><title>IEEE transactions on audio, speech, and language processing</title><addtitle>TASL</addtitle><description>This paper presents our real-time meeting analyzer for monitoring conversations in an ongoing group meeting. The goal of the system is to recognize automatically "who is speaking what" in an online manner for meeting assistance. Our system continuously captures the utterances and face poses of each speaker using a microphone array and an omni-directional camera positioned at the center of the meeting table. Through a series of advanced audio processing operations, an overlapping speech signal is enhanced and the components are separated into individual speaker's channels. Then the utterances are sequentially transcribed by our speech recognizer with low latency. In parallel with speech recognition, the activity of each participant (e.g., speaking, laughing, watching someone) and the circumstances of the meeting (e.g., topic, activeness, casualness) are detected and displayed on a browser together with the transcripts. In this paper, we describe our techniques and our attempt to achieve the low-latency monitoring of meetings, and we show our experimental results for real-time meeting transcription.</description><subject>Applied sciences</subject><subject>Browsers</subject><subject>Cameras</subject><subject>Distant microphones</subject><subject>Exact sciences and technology</subject><subject>Information, signal and communications theory</subject><subject>meeting analysis</subject><subject>Meetings</subject><subject>Microphones</subject><subject>Miscellaneous</subject><subject>Monitoring</subject><subject>Pattern recognition</subject><subject>Real time</subject><subject>Real-time systems</subject><subject>Recognition</subject><subject>Signal processing</subject><subject>speaker diarization</subject><subject>Speech</subject><subject>speech enhancement</subject><subject>Speech processing</subject><subject>Speech recognition</subject><subject>Studies</subject><subject>Telecommunications and information theory</subject><subject>topic tracking</subject><issn>1558-7916</issn><issn>2329-9290</issn><issn>1558-7924</issn><issn>2329-9304</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkU1Lw0AQhoMoWKs_QLwEQfCSup_J7lFavyAiaHsO0-1UV5JNu5si_fdubOnBy-4w88w7M7xJcknJiFKi76b3H-WIEUpHjOZCsuIoGVApVVZoJo4PMc1Pk7MQvgkRPBd0kKzL9icroUNntuk7Qp1NbYPpK2Jn3WfMmPbT2c62LgW3SGdugT50Meyrs9C_E9snuvTVGt-uvlqH4Y99a5zNJtaj6duhTsfQoIfz5GQJdcCL_T9MZo8P0_FzVr49vYzvy8wIRrtswXXcUCtgwFQhtESZG07nc6EMm7PcKJmDFqJQ8X4QZilAKC713GhiUBR8mNzudFe-XW8wdFVjg8G6BoftJlSUUKI0F7mK6PU_9Lvd-LhyqDQtFNeMkwjRHRSvDMHjslp524DfRqWq96DqPah6D6q9B7HnZi8MwUC99OCMDYdGJuOecX7krnacRcRDWWrFRZHzX3B4jtQ</recordid><startdate>20120201</startdate><enddate>20120201</enddate><creator>Hori, T.</creator><creator>Araki, S.</creator><creator>Yoshioka, T.</creator><creator>Fujimoto, M.</creator><creator>Watanabe, S.</creator><creator>Oba, T.</creator><creator>Ogawa, A.</creator><creator>Otsuka, K.</creator><creator>Mikami, D.</creator><creator>Kinoshita, K.</creator><creator>Nakatani, T.</creator><creator>Nakamura, A.</creator><creator>Yamato, J.</creator><general>IEEE</general><general>Institute of Electrical and Electronics Engineers</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20120201</creationdate><title>Low-Latency Real-Time Meeting Recognition and Understanding Using Distant Microphones and Omni-Directional Camera</title><author>Hori, T. ; Araki, S. ; Yoshioka, T. ; Fujimoto, M. ; Watanabe, S. ; Oba, T. ; Ogawa, A. ; Otsuka, K. ; Mikami, D. ; Kinoshita, K. ; Nakatani, T. ; Nakamura, A. ; Yamato, J.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c421t-d3936498a2a287495e56c31bb48c2b26c856a94478110a4cf4a48359bc90ce473</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Applied sciences</topic><topic>Browsers</topic><topic>Cameras</topic><topic>Distant microphones</topic><topic>Exact sciences and technology</topic><topic>Information, signal and communications theory</topic><topic>meeting analysis</topic><topic>Meetings</topic><topic>Microphones</topic><topic>Miscellaneous</topic><topic>Monitoring</topic><topic>Pattern recognition</topic><topic>Real time</topic><topic>Real-time systems</topic><topic>Recognition</topic><topic>Signal processing</topic><topic>speaker diarization</topic><topic>Speech</topic><topic>speech enhancement</topic><topic>Speech processing</topic><topic>Speech recognition</topic><topic>Studies</topic><topic>Telecommunications and information theory</topic><topic>topic tracking</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hori, T.</creatorcontrib><creatorcontrib>Araki, S.</creatorcontrib><creatorcontrib>Yoshioka, T.</creatorcontrib><creatorcontrib>Fujimoto, M.</creatorcontrib><creatorcontrib>Watanabe, S.</creatorcontrib><creatorcontrib>Oba, T.</creatorcontrib><creatorcontrib>Ogawa, A.</creatorcontrib><creatorcontrib>Otsuka, K.</creatorcontrib><creatorcontrib>Mikami, D.</creatorcontrib><creatorcontrib>Kinoshita, K.</creatorcontrib><creatorcontrib>Nakatani, T.</creatorcontrib><creatorcontrib>Nakamura, A.</creatorcontrib><creatorcontrib>Yamato, J.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on audio, speech, and language processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Hori, T.</au><au>Araki, S.</au><au>Yoshioka, T.</au><au>Fujimoto, M.</au><au>Watanabe, S.</au><au>Oba, T.</au><au>Ogawa, A.</au><au>Otsuka, K.</au><au>Mikami, D.</au><au>Kinoshita, K.</au><au>Nakatani, T.</au><au>Nakamura, A.</au><au>Yamato, J.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Low-Latency Real-Time Meeting Recognition and Understanding Using Distant Microphones and Omni-Directional Camera</atitle><jtitle>IEEE transactions on audio, speech, and language processing</jtitle><stitle>TASL</stitle><date>2012-02-01</date><risdate>2012</risdate><volume>20</volume><issue>2</issue><spage>499</spage><epage>513</epage><pages>499-513</pages><issn>1558-7916</issn><issn>2329-9290</issn><eissn>1558-7924</eissn><eissn>2329-9304</eissn><coden>ITASD8</coden><abstract>This paper presents our real-time meeting analyzer for monitoring conversations in an ongoing group meeting. The goal of the system is to recognize automatically "who is speaking what" in an online manner for meeting assistance. Our system continuously captures the utterances and face poses of each speaker using a microphone array and an omni-directional camera positioned at the center of the meeting table. Through a series of advanced audio processing operations, an overlapping speech signal is enhanced and the components are separated into individual speaker's channels. Then the utterances are sequentially transcribed by our speech recognizer with low latency. In parallel with speech recognition, the activity of each participant (e.g., speaking, laughing, watching someone) and the circumstances of the meeting (e.g., topic, activeness, casualness) are detected and displayed on a browser together with the transcripts. In this paper, we describe our techniques and our attempt to achieve the low-latency monitoring of meetings, and we show our experimental results for real-time meeting transcription.</abstract><cop>Piscataway, NJ</cop><pub>IEEE</pub><doi>10.1109/TASL.2011.2164527</doi><tpages>15</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1558-7916
ispartof IEEE transactions on audio, speech, and language processing, 2012-02, Vol.20 (2), p.499-513
issn 1558-7916
2329-9290
1558-7924
2329-9304
language eng
recordid cdi_pascalfrancis_primary_25473683
source IEEE Electronic Library (IEL)
subjects Applied sciences
Browsers
Cameras
Distant microphones
Exact sciences and technology
Information, signal and communications theory
meeting analysis
Meetings
Microphones
Miscellaneous
Monitoring
Pattern recognition
Real time
Real-time systems
Recognition
Signal processing
speaker diarization
Speech
speech enhancement
Speech processing
Speech recognition
Studies
Telecommunications and information theory
topic tracking
title Low-Latency Real-Time Meeting Recognition and Understanding Using Distant Microphones and Omni-Directional Camera
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T00%3A12%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Low-Latency%20Real-Time%20Meeting%20Recognition%20and%20Understanding%20Using%20Distant%20Microphones%20and%20Omni-Directional%20Camera&rft.jtitle=IEEE%20transactions%20on%20audio,%20speech,%20and%20language%20processing&rft.au=Hori,%20T.&rft.date=2012-02-01&rft.volume=20&rft.issue=2&rft.spage=499&rft.epage=513&rft.pages=499-513&rft.issn=1558-7916&rft.eissn=1558-7924&rft.coden=ITASD8&rft_id=info:doi/10.1109/TASL.2011.2164527&rft_dat=%3Cproquest_RIE%3E2570326501%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=917839230&rft_id=info:pmid/&rft_ieee_id=5983476&rfr_iscdi=true