Convolutional Neural Network and Feature Transformation for Distant Speech Recognition

In many applications, speech recognition must operate in conditions where there are some distances between speakers and the microphones. This is called distant speech recognition (DSR). In this condition, speech recognition must deal with reverberation. Nowadays, deep learning technologies are becom...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal of electrical and computer engineering (Malacca, Malacca) Malacca), 2018-12, Vol.8 (6), p.5381
Hauptverfasser:	Pardede, Hilman F., Yuliani, Asri R., Sustika, Rika
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial neural networks Digits Error reduction Feature recognition Linear transformations Machine learning Markov chains Microphones Neural networks Robustness Speech Speech recognition Voice recognition
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue	6
container_start_page	5381
container_title	International journal of electrical and computer engineering (Malacca, Malacca)
container_volume	8
creator	Pardede, Hilman F. Yuliani, Asri R. Sustika, Rika
description	In many applications, speech recognition must operate in conditions where there are some distances between speakers and the microphones. This is called distant speech recognition (DSR). In this condition, speech recognition must deal with reverberation. Nowadays, deep learning technologies are becoming the the main technologies for speech recognition. Deep Neural Network (DNN) in hybrid with Hidden Markov Model (HMM) is the commonly used architecture. However, this system is still not robust against reverberation. Previous studies use Convolutional Neural Networks (CNN), which is a variation of neural network, to improve the robustness of speech recognition against noise. CNN has the properties of pooling which is used to find local correlation between neighboring dimensions in the features. With this property, CNN could be used as feature learning emphasizing the information on neighboring frames. In this study we use CNN to deal with reverberation. We also propose to use feature transformation techniques: linear discriminat analysis (LDA) and maximum likelihood linear transformation (MLLT), on mel frequency cepstral coefficient (MFCC) before feeding them to CNN. We argue that transforming features could produce more discriminative features for CNN, and hence improve the robustness of speech recognition against reverberation. Our evaluations on Meeting Recorder Digits (MRD) subset of Aurora-5 database confirm that the use of LDA and MLLT transformations improve the robustness of speech recognition. It is better by 20% relative error reduction on compared to a standard DNN based speech recognition using the same number of hidden layers.
doi_str_mv	10.11591/ijece.v8i6.pp5381-5388
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2211002927</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2211002927</sourcerecordid><originalsourceid>FETCH-LOGICAL-c146t-71c537644996f3c32c3803a3047297a71eeebe0fc90fa51c7ef0587a7b425f003</originalsourceid><addsrcrecordid>eNpNkE9LAzEQxYMoWGo_gwHPW_Nns0mOUq0KRUGr15DGiW5tN2uSrfjt3W09OId5D95jYH4InVMypVRoelmvwcF0p-pq2raCK1r0Sx2hESNKFUoSdfzPn6JJSmvSj6oqpsUIvc5CswubLtehsRv8AF3cS_4O8RPb5g3PweYuAl5G2yQf4tYOXdw7fF2nbJuMn1sA94GfwIX3ph7iM3Ti7SbB5E_H6GV-s5zdFYvH2_vZ1aJwtKxyIakTXFZlqXXluePMcUW45aSUTEsrKQCsgHinibeCOgmeCNUHq5IJTwgfo4vD3TaGrw5SNuvQxf6TZBijlBCmmexb8tByMaQUwZs21lsbfwwlZs_R7DmagaM5cDQDR_4Lg_JpfQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2211002927</pqid></control><display><type>article</type><title>Convolutional Neural Network and Feature Transformation for Distant Speech Recognition</title><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Pardede, Hilman F. ; Yuliani, Asri R. ; Sustika, Rika</creator><creatorcontrib>Pardede, Hilman F. ; Yuliani, Asri R. ; Sustika, Rika</creatorcontrib><description>In many applications, speech recognition must operate in conditions where there are some distances between speakers and the microphones. This is called distant speech recognition (DSR). In this condition, speech recognition must deal with reverberation. Nowadays, deep learning technologies are becoming the the main technologies for speech recognition. Deep Neural Network (DNN) in hybrid with Hidden Markov Model (HMM) is the commonly used architecture. However, this system is still not robust against reverberation. Previous studies use Convolutional Neural Networks (CNN), which is a variation of neural network, to improve the robustness of speech recognition against noise. CNN has the properties of pooling which is used to find local correlation between neighboring dimensions in the features. With this property, CNN could be used as feature learning emphasizing the information on neighboring frames. In this study we use CNN to deal with reverberation. We also propose to use feature transformation techniques: linear discriminat analysis (LDA) and maximum likelihood linear transformation (MLLT), on mel frequency cepstral coefficient (MFCC) before feeding them to CNN. We argue that transforming features could produce more discriminative features for CNN, and hence improve the robustness of speech recognition against reverberation. Our evaluations on Meeting Recorder Digits (MRD) subset of Aurora-5 database confirm that the use of LDA and MLLT transformations improve the robustness of speech recognition. It is better by 20% relative error reduction on compared to a standard DNN based speech recognition using the same number of hidden layers.</description><identifier>ISSN: 2088-8708</identifier><identifier>EISSN: 2088-8708</identifier><identifier>DOI: 10.11591/ijece.v8i6.pp5381-5388</identifier><language>eng</language><publisher>Yogyakarta: IAES Institute of Advanced Engineering and Science</publisher><subject>Artificial neural networks ; Digits ; Error reduction ; Feature recognition ; Linear transformations ; Machine learning ; Markov chains ; Microphones ; Neural networks ; Robustness ; Speech ; Speech recognition ; Voice recognition</subject><ispartof>International journal of electrical and computer engineering (Malacca, Malacca), 2018-12, Vol.8 (6), p.5381</ispartof><rights>Copyright IAES Institute of Advanced Engineering and Science Dec 2018</rights><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Pardede, Hilman F.</creatorcontrib><creatorcontrib>Yuliani, Asri R.</creatorcontrib><creatorcontrib>Sustika, Rika</creatorcontrib><title>Convolutional Neural Network and Feature Transformation for Distant Speech Recognition</title><title>International journal of electrical and computer engineering (Malacca, Malacca)</title><description>In many applications, speech recognition must operate in conditions where there are some distances between speakers and the microphones. This is called distant speech recognition (DSR). In this condition, speech recognition must deal with reverberation. Nowadays, deep learning technologies are becoming the the main technologies for speech recognition. Deep Neural Network (DNN) in hybrid with Hidden Markov Model (HMM) is the commonly used architecture. However, this system is still not robust against reverberation. Previous studies use Convolutional Neural Networks (CNN), which is a variation of neural network, to improve the robustness of speech recognition against noise. CNN has the properties of pooling which is used to find local correlation between neighboring dimensions in the features. With this property, CNN could be used as feature learning emphasizing the information on neighboring frames. In this study we use CNN to deal with reverberation. We also propose to use feature transformation techniques: linear discriminat analysis (LDA) and maximum likelihood linear transformation (MLLT), on mel frequency cepstral coefficient (MFCC) before feeding them to CNN. We argue that transforming features could produce more discriminative features for CNN, and hence improve the robustness of speech recognition against reverberation. Our evaluations on Meeting Recorder Digits (MRD) subset of Aurora-5 database confirm that the use of LDA and MLLT transformations improve the robustness of speech recognition. It is better by 20% relative error reduction on compared to a standard DNN based speech recognition using the same number of hidden layers.</description><subject>Artificial neural networks</subject><subject>Digits</subject><subject>Error reduction</subject><subject>Feature recognition</subject><subject>Linear transformations</subject><subject>Machine learning</subject><subject>Markov chains</subject><subject>Microphones</subject><subject>Neural networks</subject><subject>Robustness</subject><subject>Speech</subject><subject>Speech recognition</subject><subject>Voice recognition</subject><issn>2088-8708</issn><issn>2088-8708</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNpNkE9LAzEQxYMoWGo_gwHPW_Nns0mOUq0KRUGr15DGiW5tN2uSrfjt3W09OId5D95jYH4InVMypVRoelmvwcF0p-pq2raCK1r0Sx2hESNKFUoSdfzPn6JJSmvSj6oqpsUIvc5CswubLtehsRv8AF3cS_4O8RPb5g3PweYuAl5G2yQf4tYOXdw7fF2nbJuMn1sA94GfwIX3ph7iM3Ti7SbB5E_H6GV-s5zdFYvH2_vZ1aJwtKxyIakTXFZlqXXluePMcUW45aSUTEsrKQCsgHinibeCOgmeCNUHq5IJTwgfo4vD3TaGrw5SNuvQxf6TZBijlBCmmexb8tByMaQUwZs21lsbfwwlZs_R7DmagaM5cDQDR_4Lg_JpfQ</recordid><startdate>20181201</startdate><enddate>20181201</enddate><creator>Pardede, Hilman F.</creator><creator>Yuliani, Asri R.</creator><creator>Sustika, Rika</creator><general>IAES Institute of Advanced Engineering and Science</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BVBZV</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L6V</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20181201</creationdate><title>Convolutional Neural Network and Feature Transformation for Distant Speech Recognition</title><author>Pardede, Hilman F. ; Yuliani, Asri R. ; Sustika, Rika</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c146t-71c537644996f3c32c3803a3047297a71eeebe0fc90fa51c7ef0587a7b425f003</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Artificial neural networks</topic><topic>Digits</topic><topic>Error reduction</topic><topic>Feature recognition</topic><topic>Linear transformations</topic><topic>Machine learning</topic><topic>Markov chains</topic><topic>Microphones</topic><topic>Neural networks</topic><topic>Robustness</topic><topic>Speech</topic><topic>Speech recognition</topic><topic>Voice recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Pardede, Hilman F.</creatorcontrib><creatorcontrib>Yuliani, Asri R.</creatorcontrib><creatorcontrib>Sustika, Rika</creatorcontrib><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>East & South Asia Database</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><jtitle>International journal of electrical and computer engineering (Malacca, Malacca)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Pardede, Hilman F.</au><au>Yuliani, Asri R.</au><au>Sustika, Rika</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Convolutional Neural Network and Feature Transformation for Distant Speech Recognition</atitle><jtitle>International journal of electrical and computer engineering (Malacca, Malacca)</jtitle><date>2018-12-01</date><risdate>2018</risdate><volume>8</volume><issue>6</issue><spage>5381</spage><pages>5381-</pages><issn>2088-8708</issn><eissn>2088-8708</eissn><abstract>In many applications, speech recognition must operate in conditions where there are some distances between speakers and the microphones. This is called distant speech recognition (DSR). In this condition, speech recognition must deal with reverberation. Nowadays, deep learning technologies are becoming the the main technologies for speech recognition. Deep Neural Network (DNN) in hybrid with Hidden Markov Model (HMM) is the commonly used architecture. However, this system is still not robust against reverberation. Previous studies use Convolutional Neural Networks (CNN), which is a variation of neural network, to improve the robustness of speech recognition against noise. CNN has the properties of pooling which is used to find local correlation between neighboring dimensions in the features. With this property, CNN could be used as feature learning emphasizing the information on neighboring frames. In this study we use CNN to deal with reverberation. We also propose to use feature transformation techniques: linear discriminat analysis (LDA) and maximum likelihood linear transformation (MLLT), on mel frequency cepstral coefficient (MFCC) before feeding them to CNN. We argue that transforming features could produce more discriminative features for CNN, and hence improve the robustness of speech recognition against reverberation. Our evaluations on Meeting Recorder Digits (MRD) subset of Aurora-5 database confirm that the use of LDA and MLLT transformations improve the robustness of speech recognition. It is better by 20% relative error reduction on compared to a standard DNN based speech recognition using the same number of hidden layers.</abstract><cop>Yogyakarta</cop><pub>IAES Institute of Advanced Engineering and Science</pub><doi>10.11591/ijece.v8i6.pp5381-5388</doi></addata></record>
fulltext	fulltext
identifier	ISSN: 2088-8708
ispartof	International journal of electrical and computer engineering (Malacca, Malacca), 2018-12, Vol.8 (6), p.5381
issn	2088-8708 2088-8708
language	eng
recordid	cdi_proquest_journals_2211002927
source	Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects	Artificial neural networks Digits Error reduction Feature recognition Linear transformations Machine learning Markov chains Microphones Neural networks Robustness Speech Speech recognition Voice recognition
title	Convolutional Neural Network and Feature Transformation for Distant Speech Recognition
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-06T00%3A56%3A29IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Convolutional%20Neural%20Network%20and%20Feature%20Transformation%20for%20Distant%20Speech%20Recognition&rft.jtitle=International%20journal%20of%20electrical%20and%20computer%20engineering%20(Malacca,%20Malacca)&rft.au=Pardede,%20Hilman%20F.&rft.date=2018-12-01&rft.volume=8&rft.issue=6&rft.spage=5381&rft.pages=5381-&rft.issn=2088-8708&rft.eissn=2088-8708&rft_id=info:doi/10.11591/ijece.v8i6.pp5381-5388&rft_dat=%3Cproquest_cross%3E2211002927%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2211002927&rft_id=info:pmid/&rfr_iscdi=true