Multistream sparse representation features for noise robust audio-visual speech recognition

In this paper, we propose to use exemplar-based sparse representation features for noise robust audio-visual speech recognition. First, we introduce a sparse representation technology and describe how noise robustness can be realized by the sparse representation for noise reduction. Then, feature fu...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Acoustical Science and Technology 2014/01/01, Vol.35(1), pp.17-27
Hauptverfasser:	Shen, Peng, Tamura, Satoshi, Hayamizu, Satoru
Format:	Artikel
Sprache:	eng
Schlagworte:	Audio-visual speech recognition Joint sparsity model Noise reduction Sparse representation
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	27
container_issue	1
container_start_page	17
container_title	Acoustical Science and Technology
container_volume	35
creator	Shen, Peng Tamura, Satoshi Hayamizu, Satoru
description	In this paper, we propose to use exemplar-based sparse representation features for noise robust audio-visual speech recognition. First, we introduce a sparse representation technology and describe how noise robustness can be realized by the sparse representation for noise reduction. Then, feature fusion methods are proposed to combine audio-visual features with the sparse representation. Our work provides new insight into two crucial issues in automatic speech recognition: noise reduction and robust audio-visual features. For noise reduction, we describe a noise reduction method in which speech and noise are mapped into different subspaces by the sparse representation to reduce the noise. Our proposed method can be deployed not only on audio noise reduction but also on visual noise reduction for several types of noise. For the second issue, we investigate two feature fusion methods -- late feature fusion and the joint sparsity model method -- to calculate audio-visual sparse representation features to improve the accuracy of the audio-visual speech recognition. Our proposed method can also contribute to feature fusion for the audio-visual speech recognition system. Finally, to evaluate the new sparse representation features, a database for audio-visual speech recognition is used in this research. We show the effectiveness of our proposed noise reduction on both audio and visual cases for several types of noise and the effectiveness of audio-visual feature determination by the joint sparsity model, in comparison with the late feature fusion method and traditional methods.
doi_str_mv	10.1250/ast.35.17
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_1478017760</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3184358661</sourcerecordid><originalsourceid>FETCH-LOGICAL-c542t-2064aad0aca5e8b8952e3fcad9f35a94768aa2e56b30de85e184cf383ba3a1103</originalsourceid><addsrcrecordid>eNo9kE9PwzAMxSMEEmNw4BtU4sShI2mSNj0hNI0_0hAXOHGI3NTdMnVNSVIkvj0tQ7vYlv17z9Ij5JrRBcskvYMQF1wuWHFCZoyLIpWsKE7_5jzlZV6ek4sQdpRmopT5jHy-Dm20IXqEfRJ68AETj73HgF2EaF2XNAhxGBdJ43zSOTsRrhpCTGCorUu_bRigHcWIZjuKjdt0dlJekrMG2oBX_31OPh5X78vndP329LJ8WKdGiiymGc0FQE3BgERVqVJmyBsDddlwCaUocgWQocwrTmtUEpkSpuGKV8CBMcrn5Obg23v3NWCIeucG340vNROFomMC-UTdHijjXQgeG917uwf_oxnVU3Z6zE5zqVkxsvcHdhcibPBIgo_WtHgkp7JimRLHi9mC19jxX_tGeyU</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1478017760</pqid></control><display><type>article</type><title>Multistream sparse representation features for noise robust audio-visual speech recognition</title><source>J-STAGE</source><creator>Shen, Peng ; Tamura, Satoshi ; Hayamizu, Satoru</creator><creatorcontrib>Shen, Peng ; Tamura, Satoshi ; Hayamizu, Satoru</creatorcontrib><description>In this paper, we propose to use exemplar-based sparse representation features for noise robust audio-visual speech recognition. First, we introduce a sparse representation technology and describe how noise robustness can be realized by the sparse representation for noise reduction. Then, feature fusion methods are proposed to combine audio-visual features with the sparse representation. Our work provides new insight into two crucial issues in automatic speech recognition: noise reduction and robust audio-visual features. For noise reduction, we describe a noise reduction method in which speech and noise are mapped into different subspaces by the sparse representation to reduce the noise. Our proposed method can be deployed not only on audio noise reduction but also on visual noise reduction for several types of noise. For the second issue, we investigate two feature fusion methods -- late feature fusion and the joint sparsity model method -- to calculate audio-visual sparse representation features to improve the accuracy of the audio-visual speech recognition. Our proposed method can also contribute to feature fusion for the audio-visual speech recognition system. Finally, to evaluate the new sparse representation features, a database for audio-visual speech recognition is used in this research. We show the effectiveness of our proposed noise reduction on both audio and visual cases for several types of noise and the effectiveness of audio-visual feature determination by the joint sparsity model, in comparison with the late feature fusion method and traditional methods.</description><identifier>ISSN: 1346-3969</identifier><identifier>EISSN: 1347-5177</identifier><identifier>DOI: 10.1250/ast.35.17</identifier><language>eng</language><publisher>Tokyo: ACOUSTICAL SOCIETY OF JAPAN</publisher><subject>Audio-visual speech recognition ; Joint sparsity model ; Noise reduction ; Sparse representation</subject><ispartof>Acoustical Science and Technology, 2014/01/01, Vol.35(1), pp.17-27</ispartof><rights>2014 by The Acoustical Society of Japan</rights><rights>Copyright Japan Science and Technology Agency 2014</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c542t-2064aad0aca5e8b8952e3fcad9f35a94768aa2e56b30de85e184cf383ba3a1103</citedby><cites>FETCH-LOGICAL-c542t-2064aad0aca5e8b8952e3fcad9f35a94768aa2e56b30de85e184cf383ba3a1103</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,1876,4009,27902,27903,27904</link.rule.ids></links><search><creatorcontrib>Shen, Peng</creatorcontrib><creatorcontrib>Tamura, Satoshi</creatorcontrib><creatorcontrib>Hayamizu, Satoru</creatorcontrib><title>Multistream sparse representation features for noise robust audio-visual speech recognition</title><title>Acoustical Science and Technology</title><addtitle>Acoustical Science and Technology</addtitle><description>In this paper, we propose to use exemplar-based sparse representation features for noise robust audio-visual speech recognition. First, we introduce a sparse representation technology and describe how noise robustness can be realized by the sparse representation for noise reduction. Then, feature fusion methods are proposed to combine audio-visual features with the sparse representation. Our work provides new insight into two crucial issues in automatic speech recognition: noise reduction and robust audio-visual features. For noise reduction, we describe a noise reduction method in which speech and noise are mapped into different subspaces by the sparse representation to reduce the noise. Our proposed method can be deployed not only on audio noise reduction but also on visual noise reduction for several types of noise. For the second issue, we investigate two feature fusion methods -- late feature fusion and the joint sparsity model method -- to calculate audio-visual sparse representation features to improve the accuracy of the audio-visual speech recognition. Our proposed method can also contribute to feature fusion for the audio-visual speech recognition system. Finally, to evaluate the new sparse representation features, a database for audio-visual speech recognition is used in this research. We show the effectiveness of our proposed noise reduction on both audio and visual cases for several types of noise and the effectiveness of audio-visual feature determination by the joint sparsity model, in comparison with the late feature fusion method and traditional methods.</description><subject>Audio-visual speech recognition</subject><subject>Joint sparsity model</subject><subject>Noise reduction</subject><subject>Sparse representation</subject><issn>1346-3969</issn><issn>1347-5177</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><recordid>eNo9kE9PwzAMxSMEEmNw4BtU4sShI2mSNj0hNI0_0hAXOHGI3NTdMnVNSVIkvj0tQ7vYlv17z9Ij5JrRBcskvYMQF1wuWHFCZoyLIpWsKE7_5jzlZV6ek4sQdpRmopT5jHy-Dm20IXqEfRJ68AETj73HgF2EaF2XNAhxGBdJ43zSOTsRrhpCTGCorUu_bRigHcWIZjuKjdt0dlJekrMG2oBX_31OPh5X78vndP329LJ8WKdGiiymGc0FQE3BgERVqVJmyBsDddlwCaUocgWQocwrTmtUEpkSpuGKV8CBMcrn5Obg23v3NWCIeucG340vNROFomMC-UTdHijjXQgeG917uwf_oxnVU3Z6zE5zqVkxsvcHdhcibPBIgo_WtHgkp7JimRLHi9mC19jxX_tGeyU</recordid><startdate>20140101</startdate><enddate>20140101</enddate><creator>Shen, Peng</creator><creator>Tamura, Satoshi</creator><creator>Hayamizu, Satoru</creator><general>ACOUSTICAL SOCIETY OF JAPAN</general><general>Japan Science and Technology Agency</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>7U5</scope><scope>8FD</scope><scope>H8D</scope><scope>L7M</scope></search><sort><creationdate>20140101</creationdate><title>Multistream sparse representation features for noise robust audio-visual speech recognition</title><author>Shen, Peng ; Tamura, Satoshi ; Hayamizu, Satoru</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c542t-2064aad0aca5e8b8952e3fcad9f35a94768aa2e56b30de85e184cf383ba3a1103</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Audio-visual speech recognition</topic><topic>Joint sparsity model</topic><topic>Noise reduction</topic><topic>Sparse representation</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Shen, Peng</creatorcontrib><creatorcontrib>Tamura, Satoshi</creatorcontrib><creatorcontrib>Hayamizu, Satoru</creatorcontrib><collection>CrossRef</collection><collection>Electronics & Communications Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>Technology Research Database</collection><collection>Aerospace Database</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>Acoustical Science and Technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Shen, Peng</au><au>Tamura, Satoshi</au><au>Hayamizu, Satoru</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multistream sparse representation features for noise robust audio-visual speech recognition</atitle><jtitle>Acoustical Science and Technology</jtitle><addtitle>Acoustical Science and Technology</addtitle><date>2014-01-01</date><risdate>2014</risdate><volume>35</volume><issue>1</issue><spage>17</spage><epage>27</epage><pages>17-27</pages><issn>1346-3969</issn><eissn>1347-5177</eissn><abstract>In this paper, we propose to use exemplar-based sparse representation features for noise robust audio-visual speech recognition. First, we introduce a sparse representation technology and describe how noise robustness can be realized by the sparse representation for noise reduction. Then, feature fusion methods are proposed to combine audio-visual features with the sparse representation. Our work provides new insight into two crucial issues in automatic speech recognition: noise reduction and robust audio-visual features. For noise reduction, we describe a noise reduction method in which speech and noise are mapped into different subspaces by the sparse representation to reduce the noise. Our proposed method can be deployed not only on audio noise reduction but also on visual noise reduction for several types of noise. For the second issue, we investigate two feature fusion methods -- late feature fusion and the joint sparsity model method -- to calculate audio-visual sparse representation features to improve the accuracy of the audio-visual speech recognition. Our proposed method can also contribute to feature fusion for the audio-visual speech recognition system. Finally, to evaluate the new sparse representation features, a database for audio-visual speech recognition is used in this research. We show the effectiveness of our proposed noise reduction on both audio and visual cases for several types of noise and the effectiveness of audio-visual feature determination by the joint sparsity model, in comparison with the late feature fusion method and traditional methods.</abstract><cop>Tokyo</cop><pub>ACOUSTICAL SOCIETY OF JAPAN</pub><doi>10.1250/ast.35.17</doi><tpages>11</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1346-3969
ispartof	Acoustical Science and Technology, 2014/01/01, Vol.35(1), pp.17-27
issn	1346-3969 1347-5177
language	eng
recordid	cdi_proquest_journals_1478017760
source	J-STAGE
subjects	Audio-visual speech recognition Joint sparsity model Noise reduction Sparse representation
title	Multistream sparse representation features for noise robust audio-visual speech recognition
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-24T19%3A12%3A57IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multistream%20sparse%20representation%20features%20for%20noise%20robust%20audio-visual%20speech%20recognition&rft.jtitle=Acoustical%20Science%20and%20Technology&rft.au=Shen,%20Peng&rft.date=2014-01-01&rft.volume=35&rft.issue=1&rft.spage=17&rft.epage=27&rft.pages=17-27&rft.issn=1346-3969&rft.eissn=1347-5177&rft_id=info:doi/10.1250/ast.35.17&rft_dat=%3Cproquest_cross%3E3184358661%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1478017760&rft_id=info:pmid/&rfr_iscdi=true