Strategies for distant speech recognitionin reverberant environments
Reverberation and noise are known to severely affect the automatic speech recognition (ASR) performance of speech recorded by distant microphones. Therefore, we must deal with reverberation if we are to realize high-performance hands-free speech recognition. In this paper, we review a recognition sy...
Gespeichert in:
Veröffentlicht in: | EURASIP journal on advances in signal processing 2015-07, Vol.2015 (1), p.1-15, Article 60 |
---|---|
Hauptverfasser: | , , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 15 |
---|---|
container_issue | 1 |
container_start_page | 1 |
container_title | EURASIP journal on advances in signal processing |
container_volume | 2015 |
creator | Delcroix, Marc Yoshioka, Takuya Ogawa, Atsunori Kubo, Yotaro Fujimoto, Masakiyo Ito, Nobutaka Kinoshita, Keisuke Espi, Miquel Araki, Shoko Hori, Takaaki Nakatani, Tomohiro |
description | Reverberation and noise are known to severely affect the automatic speech recognition (ASR) performance of speech recorded by distant microphones. Therefore, we must deal with reverberation if we are to realize high-performance hands-free speech recognition. In this paper, we review a recognition system that we developed at our laboratory to deal with reverberant speech. The system consists of a speech enhancement (SE) front-end that employs long-term linear prediction-based dereverberation followed by noise reduction. We combine our SE front-end with an ASR back-end that uses neural networks for acoustic and language modeling. The proposed system achieved top scores on the ASR task of the REVERB challenge. This paper describes the different technologies used in our system and presents detailed experimental results that justify our implementation choices and may provide hints for designing distant ASR systems. |
doi_str_mv | 10.1186/s13634-015-0245-7 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1808047595</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1808047595</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3077-672720e3281dc38df29920b3613d5cc99149360db9647c14698e52a7798ddb8e3</originalsourceid><addsrcrecordid>eNp1kE1LAzEQhoMoWKs_wNuCFy-r-dh8HaV-QsGDeg672dma0iY1SQv-e1PWQxE8zQw87zszL0KXBN8QosRtIkywpsaE15g2vJZHaEKEkrUgCh8f9KfoLKUlxlxQTCfo_i3HNsPCQaqGEKvepdz6XKUNgP2sItiw8C674J0v0w5iB3EPgN-5GPwafE7n6GRoVwkufusUfTw-vM-e6_nr08vsbl5bhmVZL6mkGBhVpLdM9QPVmuKOCcJ6bq3WpNFM4L7TopGWNEIr4LSVUqu-7xSwKboefTcxfG0hZbN2ycJq1XoI22TKewo3kmte0Ks_6DJsoy_XGSK0xIIzrQpFRsrGkFKEwWyiW7fx2xBs9rmaMVdTcjX7XI0sGjpqUmH9AuKB87-iHwcxecI</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1697065398</pqid></control><display><type>article</type><title>Strategies for distant speech recognitionin reverberant environments</title><source>DOAJ Directory of Open Access Journals</source><source>Springer Nature OA Free Journals</source><source>Springer Nature - Complete Springer Journals</source><source>Alma/SFX Local Collection</source><creator>Delcroix, Marc ; Yoshioka, Takuya ; Ogawa, Atsunori ; Kubo, Yotaro ; Fujimoto, Masakiyo ; Ito, Nobutaka ; Kinoshita, Keisuke ; Espi, Miquel ; Araki, Shoko ; Hori, Takaaki ; Nakatani, Tomohiro</creator><creatorcontrib>Delcroix, Marc ; Yoshioka, Takuya ; Ogawa, Atsunori ; Kubo, Yotaro ; Fujimoto, Masakiyo ; Ito, Nobutaka ; Kinoshita, Keisuke ; Espi, Miquel ; Araki, Shoko ; Hori, Takaaki ; Nakatani, Tomohiro</creatorcontrib><description>Reverberation and noise are known to severely affect the automatic speech recognition (ASR) performance of speech recorded by distant microphones. Therefore, we must deal with reverberation if we are to realize high-performance hands-free speech recognition. In this paper, we review a recognition system that we developed at our laboratory to deal with reverberant speech. The system consists of a speech enhancement (SE) front-end that employs long-term linear prediction-based dereverberation followed by noise reduction. We combine our SE front-end with an ASR back-end that uses neural networks for acoustic and language modeling. The proposed system achieved top scores on the ASR task of the REVERB challenge. This paper describes the different technologies used in our system and presents detailed experimental results that justify our implementation choices and may provide hints for designing distant ASR systems.</description><identifier>ISSN: 1687-6180</identifier><identifier>ISSN: 1687-6172</identifier><identifier>EISSN: 1687-6180</identifier><identifier>DOI: 10.1186/s13634-015-0245-7</identifier><language>eng</language><publisher>Cham: Springer International Publishing</publisher><subject>Acoustic noise ; Engineering ; Mathematical models ; Neural networks ; Quantum Information Technology ; Signal,Image and Speech Processing ; Speech ; Speech processing ; Speech recognition ; Spintronics ; Strategy ; Tasks ; ‘Silencing the Echoes’ – Processing of Reverberant Speech</subject><ispartof>EURASIP journal on advances in signal processing, 2015-07, Vol.2015 (1), p.1-15, Article 60</ispartof><rights>Delcroix et al.; licensee Springer. 2015. This is an Open Access article distributed under the terms of the Creative Commons Attribution License( ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</rights><rights>The Author(s) 2015</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3077-672720e3281dc38df29920b3613d5cc99149360db9647c14698e52a7798ddb8e3</citedby><cites>FETCH-LOGICAL-c3077-672720e3281dc38df29920b3613d5cc99149360db9647c14698e52a7798ddb8e3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1186/s13634-015-0245-7$$EPDF$$P50$$Gspringer$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1186/s13634-015-0245-7$$EHTML$$P50$$Gspringer$$Hfree_for_read</linktohtml><link.rule.ids>314,778,782,862,27907,27908,41103,41471,42172,42540,51302,51559</link.rule.ids></links><search><creatorcontrib>Delcroix, Marc</creatorcontrib><creatorcontrib>Yoshioka, Takuya</creatorcontrib><creatorcontrib>Ogawa, Atsunori</creatorcontrib><creatorcontrib>Kubo, Yotaro</creatorcontrib><creatorcontrib>Fujimoto, Masakiyo</creatorcontrib><creatorcontrib>Ito, Nobutaka</creatorcontrib><creatorcontrib>Kinoshita, Keisuke</creatorcontrib><creatorcontrib>Espi, Miquel</creatorcontrib><creatorcontrib>Araki, Shoko</creatorcontrib><creatorcontrib>Hori, Takaaki</creatorcontrib><creatorcontrib>Nakatani, Tomohiro</creatorcontrib><title>Strategies for distant speech recognitionin reverberant environments</title><title>EURASIP journal on advances in signal processing</title><addtitle>EURASIP J. Adv. Signal Process</addtitle><description>Reverberation and noise are known to severely affect the automatic speech recognition (ASR) performance of speech recorded by distant microphones. Therefore, we must deal with reverberation if we are to realize high-performance hands-free speech recognition. In this paper, we review a recognition system that we developed at our laboratory to deal with reverberant speech. The system consists of a speech enhancement (SE) front-end that employs long-term linear prediction-based dereverberation followed by noise reduction. We combine our SE front-end with an ASR back-end that uses neural networks for acoustic and language modeling. The proposed system achieved top scores on the ASR task of the REVERB challenge. This paper describes the different technologies used in our system and presents detailed experimental results that justify our implementation choices and may provide hints for designing distant ASR systems.</description><subject>Acoustic noise</subject><subject>Engineering</subject><subject>Mathematical models</subject><subject>Neural networks</subject><subject>Quantum Information Technology</subject><subject>Signal,Image and Speech Processing</subject><subject>Speech</subject><subject>Speech processing</subject><subject>Speech recognition</subject><subject>Spintronics</subject><subject>Strategy</subject><subject>Tasks</subject><subject>‘Silencing the Echoes’ – Processing of Reverberant Speech</subject><issn>1687-6180</issn><issn>1687-6172</issn><issn>1687-6180</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><sourceid>C6C</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp1kE1LAzEQhoMoWKs_wNuCFy-r-dh8HaV-QsGDeg672dma0iY1SQv-e1PWQxE8zQw87zszL0KXBN8QosRtIkywpsaE15g2vJZHaEKEkrUgCh8f9KfoLKUlxlxQTCfo_i3HNsPCQaqGEKvepdz6XKUNgP2sItiw8C674J0v0w5iB3EPgN-5GPwafE7n6GRoVwkufusUfTw-vM-e6_nr08vsbl5bhmVZL6mkGBhVpLdM9QPVmuKOCcJ6bq3WpNFM4L7TopGWNEIr4LSVUqu-7xSwKboefTcxfG0hZbN2ycJq1XoI22TKewo3kmte0Ks_6DJsoy_XGSK0xIIzrQpFRsrGkFKEwWyiW7fx2xBs9rmaMVdTcjX7XI0sGjpqUmH9AuKB87-iHwcxecI</recordid><startdate>20150719</startdate><enddate>20150719</enddate><creator>Delcroix, Marc</creator><creator>Yoshioka, Takuya</creator><creator>Ogawa, Atsunori</creator><creator>Kubo, Yotaro</creator><creator>Fujimoto, Masakiyo</creator><creator>Ito, Nobutaka</creator><creator>Kinoshita, Keisuke</creator><creator>Espi, Miquel</creator><creator>Araki, Shoko</creator><creator>Hori, Takaaki</creator><creator>Nakatani, Tomohiro</creator><general>Springer International Publishing</general><general>Springer Nature B.V</general><scope>C6C</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7SP</scope><scope>7XB</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope></search><sort><creationdate>20150719</creationdate><title>Strategies for distant speech recognitionin reverberant environments</title><author>Delcroix, Marc ; Yoshioka, Takuya ; Ogawa, Atsunori ; Kubo, Yotaro ; Fujimoto, Masakiyo ; Ito, Nobutaka ; Kinoshita, Keisuke ; Espi, Miquel ; Araki, Shoko ; Hori, Takaaki ; Nakatani, Tomohiro</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3077-672720e3281dc38df29920b3613d5cc99149360db9647c14698e52a7798ddb8e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Acoustic noise</topic><topic>Engineering</topic><topic>Mathematical models</topic><topic>Neural networks</topic><topic>Quantum Information Technology</topic><topic>Signal,Image and Speech Processing</topic><topic>Speech</topic><topic>Speech processing</topic><topic>Speech recognition</topic><topic>Spintronics</topic><topic>Strategy</topic><topic>Tasks</topic><topic>‘Silencing the Echoes’ – Processing of Reverberant Speech</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Delcroix, Marc</creatorcontrib><creatorcontrib>Yoshioka, Takuya</creatorcontrib><creatorcontrib>Ogawa, Atsunori</creatorcontrib><creatorcontrib>Kubo, Yotaro</creatorcontrib><creatorcontrib>Fujimoto, Masakiyo</creatorcontrib><creatorcontrib>Ito, Nobutaka</creatorcontrib><creatorcontrib>Kinoshita, Keisuke</creatorcontrib><creatorcontrib>Espi, Miquel</creatorcontrib><creatorcontrib>Araki, Shoko</creatorcontrib><creatorcontrib>Hori, Takaaki</creatorcontrib><creatorcontrib>Nakatani, Tomohiro</creatorcontrib><collection>Springer Nature OA Free Journals</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection (ProQuest)</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>EURASIP journal on advances in signal processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Delcroix, Marc</au><au>Yoshioka, Takuya</au><au>Ogawa, Atsunori</au><au>Kubo, Yotaro</au><au>Fujimoto, Masakiyo</au><au>Ito, Nobutaka</au><au>Kinoshita, Keisuke</au><au>Espi, Miquel</au><au>Araki, Shoko</au><au>Hori, Takaaki</au><au>Nakatani, Tomohiro</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Strategies for distant speech recognitionin reverberant environments</atitle><jtitle>EURASIP journal on advances in signal processing</jtitle><stitle>EURASIP J. Adv. Signal Process</stitle><date>2015-07-19</date><risdate>2015</risdate><volume>2015</volume><issue>1</issue><spage>1</spage><epage>15</epage><pages>1-15</pages><artnum>60</artnum><issn>1687-6180</issn><issn>1687-6172</issn><eissn>1687-6180</eissn><abstract>Reverberation and noise are known to severely affect the automatic speech recognition (ASR) performance of speech recorded by distant microphones. Therefore, we must deal with reverberation if we are to realize high-performance hands-free speech recognition. In this paper, we review a recognition system that we developed at our laboratory to deal with reverberant speech. The system consists of a speech enhancement (SE) front-end that employs long-term linear prediction-based dereverberation followed by noise reduction. We combine our SE front-end with an ASR back-end that uses neural networks for acoustic and language modeling. The proposed system achieved top scores on the ASR task of the REVERB challenge. This paper describes the different technologies used in our system and presents detailed experimental results that justify our implementation choices and may provide hints for designing distant ASR systems.</abstract><cop>Cham</cop><pub>Springer International Publishing</pub><doi>10.1186/s13634-015-0245-7</doi><tpages>15</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1687-6180 |
ispartof | EURASIP journal on advances in signal processing, 2015-07, Vol.2015 (1), p.1-15, Article 60 |
issn | 1687-6180 1687-6172 1687-6180 |
language | eng |
recordid | cdi_proquest_miscellaneous_1808047595 |
source | DOAJ Directory of Open Access Journals; Springer Nature OA Free Journals; Springer Nature - Complete Springer Journals; Alma/SFX Local Collection |
subjects | Acoustic noise Engineering Mathematical models Neural networks Quantum Information Technology Signal,Image and Speech Processing Speech Speech processing Speech recognition Spintronics Strategy Tasks ‘Silencing the Echoes’ – Processing of Reverberant Speech |
title | Strategies for distant speech recognitionin reverberant environments |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T13%3A16%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Strategies%20for%20distant%20speech%20recognitionin%20reverberant%20environments&rft.jtitle=EURASIP%20journal%20on%20advances%20in%20signal%20processing&rft.au=Delcroix,%20Marc&rft.date=2015-07-19&rft.volume=2015&rft.issue=1&rft.spage=1&rft.epage=15&rft.pages=1-15&rft.artnum=60&rft.issn=1687-6180&rft.eissn=1687-6180&rft_id=info:doi/10.1186/s13634-015-0245-7&rft_dat=%3Cproquest_cross%3E1808047595%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1697065398&rft_id=info:pmid/&rfr_iscdi=true |