A computationally efficient mel-filter bank VAD algorithm for distributed speech recognition systems

This paper presents a novel computationally efficient voice activity detection (VAD) algorithm and emphasizes the importance of such algorithms in distributed speech recognition (DSR) systems. When using VAD algorithms in telecommunication systems, the required capacity of the speech transmission ch...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	EURASIP Journal on Applied Signal Processing 2005-03, Vol.2005 (4), p.487-497, Article 561951
Hauptverfasser:	VLAJ, Damjan, KOTNIK, Bojan, HORVAT, Bogomir, KACIC, Zdravko
Format:	Artikel
Sprache:	eng
Schlagworte:	Applied sciences Detection, estimation, filtering, equalization, prediction Exact sciences and technology Information, signal and communications theory Signal and communications theory Signal processing Signal, noise Speech processing Systems, networks and services of telecommunications Telecommunications Telecommunications and information theory Transmission and modulation (techniques and equipments)
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	497
container_issue	4
container_start_page	487
container_title	EURASIP Journal on Applied Signal Processing
container_volume	2005
creator	VLAJ, Damjan KOTNIK, Bojan HORVAT, Bogomir KACIC, Zdravko
description	This paper presents a novel computationally efficient voice activity detection (VAD) algorithm and emphasizes the importance of such algorithms in distributed speech recognition (DSR) systems. When using VAD algorithms in telecommunication systems, the required capacity of the speech transmission channel can be reduced if only the speech parts of the signal are transmitted. A similar objective can be adopted in DSR systems, where the nonspeech parameters are not sent over the transmission channel. A novel approach is proposed for VAD decisions based on mel-filter bank (MFB) outputs with the so-called Hangover criterion. Comparative tests are presented between the presented MFB VAD algorithm and three VAD algorithms used in the G.729, G.723.1, and DSR (advanced front-end) Standards. These tests were made on the Aurora 2 database, with different signal-to-noise (SNRs) ratios. In the speech recognition tests, the proposed MFB VAD outperformed all the three VAD algorithms used in the standards by 14.19% relative (G.723.1 VAD), by 12.84% relative (G.729 VAD), and by 4.17% relative (DSR VAD) in all SNRs.
doi_str_mv	10.1155/ASP.2005.487
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_28198102</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>28198102</sourcerecordid><originalsourceid>FETCH-LOGICAL-c334t-9af079ea1ef1628cfb2fffda4ccf16e4b969e34aa9512cdc75769d7af47507f73</originalsourceid><addsrcrecordid>eNpFkMtqwzAQRUVpoSHNrh-gTbuqU8m2Hl6a9AmBFvrYClkeJaLyo5KyyN_XIYGuZhjOvQwHoWtKlpQydl9_vC9zQtiylOIMzSiXIuNUkvNpp5RkkjNxiRYxuoYQTpmQQsxQW2MzdOMu6eSGXnu_x2CtMw76hDvwmXU-QcCN7n_wd_2Atd8MwaVth-0QcOtiCq7ZJWhxHAHMFgcww6Z3hzoc9zFBF6_QhdU-wuI05-jr6fFz9ZKt355fV_U6M0VRpqzSlogKNAVLeS6NbXJrbatLY6YDlE3FKyhKrStGc9MawQSvWqFtKRgRVhRzdHvsHcPwu4OYVOeiAe91D8MuqlzSSlKST-DdETRhiDGAVWNwnQ57RYk62FSTTXWwqSabE35z6tXRaG-D7o2L_xkup4c4L_4ASF926g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>28198102</pqid></control><display><type>article</type><title>A computationally efficient mel-filter bank VAD algorithm for distributed speech recognition systems</title><source>DOAJ Directory of Open Access Journals</source><source>SpringerLink Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>Springer Nature OA Free Journals</source><creator>VLAJ, Damjan ; KOTNIK, Bojan ; HORVAT, Bogomir ; KACIC, Zdravko</creator><creatorcontrib>VLAJ, Damjan ; KOTNIK, Bojan ; HORVAT, Bogomir ; KACIC, Zdravko</creatorcontrib><description>This paper presents a novel computationally efficient voice activity detection (VAD) algorithm and emphasizes the importance of such algorithms in distributed speech recognition (DSR) systems. When using VAD algorithms in telecommunication systems, the required capacity of the speech transmission channel can be reduced if only the speech parts of the signal are transmitted. A similar objective can be adopted in DSR systems, where the nonspeech parameters are not sent over the transmission channel. A novel approach is proposed for VAD decisions based on mel-filter bank (MFB) outputs with the so-called Hangover criterion. Comparative tests are presented between the presented MFB VAD algorithm and three VAD algorithms used in the G.729, G.723.1, and DSR (advanced front-end) Standards. These tests were made on the Aurora 2 database, with different signal-to-noise (SNRs) ratios. In the speech recognition tests, the proposed MFB VAD outperformed all the three VAD algorithms used in the standards by 14.19% relative (G.723.1 VAD), by 12.84% relative (G.729 VAD), and by 4.17% relative (DSR VAD) in all SNRs.</description><identifier>ISSN: 1110-8657</identifier><identifier>ISSN: 1687-6180</identifier><identifier>EISSN: 1687-6180</identifier><identifier>DOI: 10.1155/ASP.2005.487</identifier><language>eng</language><publisher>New York, NY: Hindawi Publishing Corporation</publisher><subject>Applied sciences ; Detection, estimation, filtering, equalization, prediction ; Exact sciences and technology ; Information, signal and communications theory ; Signal and communications theory ; Signal processing ; Signal, noise ; Speech processing ; Systems, networks and services of telecommunications ; Telecommunications ; Telecommunications and information theory ; Transmission and modulation (techniques and equipments)</subject><ispartof>EURASIP Journal on Applied Signal Processing, 2005-03, Vol.2005 (4), p.487-497, Article 561951</ispartof><rights>2005 INIST-CNRS</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c334t-9af079ea1ef1628cfb2fffda4ccf16e4b969e34aa9512cdc75769d7af47507f73</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,860,27901,27902</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=16851266$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>VLAJ, Damjan</creatorcontrib><creatorcontrib>KOTNIK, Bojan</creatorcontrib><creatorcontrib>HORVAT, Bogomir</creatorcontrib><creatorcontrib>KACIC, Zdravko</creatorcontrib><title>A computationally efficient mel-filter bank VAD algorithm for distributed speech recognition systems</title><title>EURASIP Journal on Applied Signal Processing</title><description>This paper presents a novel computationally efficient voice activity detection (VAD) algorithm and emphasizes the importance of such algorithms in distributed speech recognition (DSR) systems. When using VAD algorithms in telecommunication systems, the required capacity of the speech transmission channel can be reduced if only the speech parts of the signal are transmitted. A similar objective can be adopted in DSR systems, where the nonspeech parameters are not sent over the transmission channel. A novel approach is proposed for VAD decisions based on mel-filter bank (MFB) outputs with the so-called Hangover criterion. Comparative tests are presented between the presented MFB VAD algorithm and three VAD algorithms used in the G.729, G.723.1, and DSR (advanced front-end) Standards. These tests were made on the Aurora 2 database, with different signal-to-noise (SNRs) ratios. In the speech recognition tests, the proposed MFB VAD outperformed all the three VAD algorithms used in the standards by 14.19% relative (G.723.1 VAD), by 12.84% relative (G.729 VAD), and by 4.17% relative (DSR VAD) in all SNRs.</description><subject>Applied sciences</subject><subject>Detection, estimation, filtering, equalization, prediction</subject><subject>Exact sciences and technology</subject><subject>Information, signal and communications theory</subject><subject>Signal and communications theory</subject><subject>Signal processing</subject><subject>Signal, noise</subject><subject>Speech processing</subject><subject>Systems, networks and services of telecommunications</subject><subject>Telecommunications</subject><subject>Telecommunications and information theory</subject><subject>Transmission and modulation (techniques and equipments)</subject><issn>1110-8657</issn><issn>1687-6180</issn><issn>1687-6180</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2005</creationdate><recordtype>article</recordtype><recordid>eNpFkMtqwzAQRUVpoSHNrh-gTbuqU8m2Hl6a9AmBFvrYClkeJaLyo5KyyN_XIYGuZhjOvQwHoWtKlpQydl9_vC9zQtiylOIMzSiXIuNUkvNpp5RkkjNxiRYxuoYQTpmQQsxQW2MzdOMu6eSGXnu_x2CtMw76hDvwmXU-QcCN7n_wd_2Atd8MwaVth-0QcOtiCq7ZJWhxHAHMFgcww6Z3hzoc9zFBF6_QhdU-wuI05-jr6fFz9ZKt355fV_U6M0VRpqzSlogKNAVLeS6NbXJrbatLY6YDlE3FKyhKrStGc9MawQSvWqFtKRgRVhRzdHvsHcPwu4OYVOeiAe91D8MuqlzSSlKST-DdETRhiDGAVWNwnQ57RYk62FSTTXWwqSabE35z6tXRaG-D7o2L_xkup4c4L_4ASF926g</recordid><startdate>20050315</startdate><enddate>20050315</enddate><creator>VLAJ, Damjan</creator><creator>KOTNIK, Bojan</creator><creator>HORVAT, Bogomir</creator><creator>KACIC, Zdravko</creator><general>Hindawi Publishing Corporation</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20050315</creationdate><title>A computationally efficient mel-filter bank VAD algorithm for distributed speech recognition systems</title><author>VLAJ, Damjan ; KOTNIK, Bojan ; HORVAT, Bogomir ; KACIC, Zdravko</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c334t-9af079ea1ef1628cfb2fffda4ccf16e4b969e34aa9512cdc75769d7af47507f73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Applied sciences</topic><topic>Detection, estimation, filtering, equalization, prediction</topic><topic>Exact sciences and technology</topic><topic>Information, signal and communications theory</topic><topic>Signal and communications theory</topic><topic>Signal processing</topic><topic>Signal, noise</topic><topic>Speech processing</topic><topic>Systems, networks and services of telecommunications</topic><topic>Telecommunications</topic><topic>Telecommunications and information theory</topic><topic>Transmission and modulation (techniques and equipments)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>VLAJ, Damjan</creatorcontrib><creatorcontrib>KOTNIK, Bojan</creatorcontrib><creatorcontrib>HORVAT, Bogomir</creatorcontrib><creatorcontrib>KACIC, Zdravko</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>EURASIP Journal on Applied Signal Processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>VLAJ, Damjan</au><au>KOTNIK, Bojan</au><au>HORVAT, Bogomir</au><au>KACIC, Zdravko</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A computationally efficient mel-filter bank VAD algorithm for distributed speech recognition systems</atitle><jtitle>EURASIP Journal on Applied Signal Processing</jtitle><date>2005-03-15</date><risdate>2005</risdate><volume>2005</volume><issue>4</issue><spage>487</spage><epage>497</epage><pages>487-497</pages><artnum>561951</artnum><issn>1110-8657</issn><issn>1687-6180</issn><eissn>1687-6180</eissn><abstract>This paper presents a novel computationally efficient voice activity detection (VAD) algorithm and emphasizes the importance of such algorithms in distributed speech recognition (DSR) systems. When using VAD algorithms in telecommunication systems, the required capacity of the speech transmission channel can be reduced if only the speech parts of the signal are transmitted. A similar objective can be adopted in DSR systems, where the nonspeech parameters are not sent over the transmission channel. A novel approach is proposed for VAD decisions based on mel-filter bank (MFB) outputs with the so-called Hangover criterion. Comparative tests are presented between the presented MFB VAD algorithm and three VAD algorithms used in the G.729, G.723.1, and DSR (advanced front-end) Standards. These tests were made on the Aurora 2 database, with different signal-to-noise (SNRs) ratios. In the speech recognition tests, the proposed MFB VAD outperformed all the three VAD algorithms used in the standards by 14.19% relative (G.723.1 VAD), by 12.84% relative (G.729 VAD), and by 4.17% relative (DSR VAD) in all SNRs.</abstract><cop>New York, NY</cop><pub>Hindawi Publishing Corporation</pub><doi>10.1155/ASP.2005.487</doi><tpages>11</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1110-8657
ispartof	EURASIP Journal on Applied Signal Processing, 2005-03, Vol.2005 (4), p.487-497, Article 561951
issn	1110-8657 1687-6180 1687-6180
language	eng
recordid	cdi_proquest_miscellaneous_28198102
source	DOAJ Directory of Open Access Journals; SpringerLink Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; Springer Nature OA Free Journals
subjects	Applied sciences Detection, estimation, filtering, equalization, prediction Exact sciences and technology Information, signal and communications theory Signal and communications theory Signal processing Signal, noise Speech processing Systems, networks and services of telecommunications Telecommunications Telecommunications and information theory Transmission and modulation (techniques and equipments)
title	A computationally efficient mel-filter bank VAD algorithm for distributed speech recognition systems
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-20T11%3A55%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20computationally%20efficient%20mel-filter%20bank%20VAD%20algorithm%20for%20distributed%20speech%20recognition%20systems&rft.jtitle=EURASIP%20Journal%20on%20Applied%20Signal%20Processing&rft.au=VLAJ,%20Damjan&rft.date=2005-03-15&rft.volume=2005&rft.issue=4&rft.spage=487&rft.epage=497&rft.pages=487-497&rft.artnum=561951&rft.issn=1110-8657&rft.eissn=1687-6180&rft_id=info:doi/10.1155/ASP.2005.487&rft_dat=%3Cproquest_cross%3E28198102%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=28198102&rft_id=info:pmid/&rfr_iscdi=true