A computationally efficient mel-filter bank VAD algorithm for distributed speech recognition systems

This paper presents a novel computationally efficient voice activity detection (VAD) algorithm and emphasizes the importance of such algorithms in distributed speech recognition (DSR) systems. When using VAD algorithms in telecommunication systems, the required capacity of the speech transmission ch...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:EURASIP Journal on Applied Signal Processing 2005-03, Vol.2005 (4), p.487-497, Article 561951
Hauptverfasser: VLAJ, Damjan, KOTNIK, Bojan, HORVAT, Bogomir, KACIC, Zdravko
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 497
container_issue 4
container_start_page 487
container_title EURASIP Journal on Applied Signal Processing
container_volume 2005
creator VLAJ, Damjan
KOTNIK, Bojan
HORVAT, Bogomir
KACIC, Zdravko
description This paper presents a novel computationally efficient voice activity detection (VAD) algorithm and emphasizes the importance of such algorithms in distributed speech recognition (DSR) systems. When using VAD algorithms in telecommunication systems, the required capacity of the speech transmission channel can be reduced if only the speech parts of the signal are transmitted. A similar objective can be adopted in DSR systems, where the nonspeech parameters are not sent over the transmission channel. A novel approach is proposed for VAD decisions based on mel-filter bank (MFB) outputs with the so-called Hangover criterion. Comparative tests are presented between the presented MFB VAD algorithm and three VAD algorithms used in the G.729, G.723.1, and DSR (advanced front-end) Standards. These tests were made on the Aurora 2 database, with different signal-to-noise (SNRs) ratios. In the speech recognition tests, the proposed MFB VAD outperformed all the three VAD algorithms used in the standards by 14.19% relative (G.723.1 VAD), by 12.84% relative (G.729 VAD), and by 4.17% relative (DSR VAD) in all SNRs.
doi_str_mv 10.1155/ASP.2005.487
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_28198102</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>28198102</sourcerecordid><originalsourceid>FETCH-LOGICAL-c334t-9af079ea1ef1628cfb2fffda4ccf16e4b969e34aa9512cdc75769d7af47507f73</originalsourceid><addsrcrecordid>eNpFkMtqwzAQRUVpoSHNrh-gTbuqU8m2Hl6a9AmBFvrYClkeJaLyo5KyyN_XIYGuZhjOvQwHoWtKlpQydl9_vC9zQtiylOIMzSiXIuNUkvNpp5RkkjNxiRYxuoYQTpmQQsxQW2MzdOMu6eSGXnu_x2CtMw76hDvwmXU-QcCN7n_wd_2Atd8MwaVth-0QcOtiCq7ZJWhxHAHMFgcww6Z3hzoc9zFBF6_QhdU-wuI05-jr6fFz9ZKt355fV_U6M0VRpqzSlogKNAVLeS6NbXJrbatLY6YDlE3FKyhKrStGc9MawQSvWqFtKRgRVhRzdHvsHcPwu4OYVOeiAe91D8MuqlzSSlKST-DdETRhiDGAVWNwnQ57RYk62FSTTXWwqSabE35z6tXRaG-D7o2L_xkup4c4L_4ASF926g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>28198102</pqid></control><display><type>article</type><title>A computationally efficient mel-filter bank VAD algorithm for distributed speech recognition systems</title><source>DOAJ Directory of Open Access Journals</source><source>SpringerLink Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>Springer Nature OA Free Journals</source><creator>VLAJ, Damjan ; KOTNIK, Bojan ; HORVAT, Bogomir ; KACIC, Zdravko</creator><creatorcontrib>VLAJ, Damjan ; KOTNIK, Bojan ; HORVAT, Bogomir ; KACIC, Zdravko</creatorcontrib><description>This paper presents a novel computationally efficient voice activity detection (VAD) algorithm and emphasizes the importance of such algorithms in distributed speech recognition (DSR) systems. When using VAD algorithms in telecommunication systems, the required capacity of the speech transmission channel can be reduced if only the speech parts of the signal are transmitted. A similar objective can be adopted in DSR systems, where the nonspeech parameters are not sent over the transmission channel. A novel approach is proposed for VAD decisions based on mel-filter bank (MFB) outputs with the so-called Hangover criterion. Comparative tests are presented between the presented MFB VAD algorithm and three VAD algorithms used in the G.729, G.723.1, and DSR (advanced front-end) Standards. These tests were made on the Aurora 2 database, with different signal-to-noise (SNRs) ratios. In the speech recognition tests, the proposed MFB VAD outperformed all the three VAD algorithms used in the standards by 14.19% relative (G.723.1 VAD), by 12.84% relative (G.729 VAD), and by 4.17% relative (DSR VAD) in all SNRs.</description><identifier>ISSN: 1110-8657</identifier><identifier>ISSN: 1687-6180</identifier><identifier>EISSN: 1687-6180</identifier><identifier>DOI: 10.1155/ASP.2005.487</identifier><language>eng</language><publisher>New York, NY: Hindawi Publishing Corporation</publisher><subject>Applied sciences ; Detection, estimation, filtering, equalization, prediction ; Exact sciences and technology ; Information, signal and communications theory ; Signal and communications theory ; Signal processing ; Signal, noise ; Speech processing ; Systems, networks and services of telecommunications ; Telecommunications ; Telecommunications and information theory ; Transmission and modulation (techniques and equipments)</subject><ispartof>EURASIP Journal on Applied Signal Processing, 2005-03, Vol.2005 (4), p.487-497, Article 561951</ispartof><rights>2005 INIST-CNRS</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c334t-9af079ea1ef1628cfb2fffda4ccf16e4b969e34aa9512cdc75769d7af47507f73</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,860,27901,27902</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=16851266$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>VLAJ, Damjan</creatorcontrib><creatorcontrib>KOTNIK, Bojan</creatorcontrib><creatorcontrib>HORVAT, Bogomir</creatorcontrib><creatorcontrib>KACIC, Zdravko</creatorcontrib><title>A computationally efficient mel-filter bank VAD algorithm for distributed speech recognition systems</title><title>EURASIP Journal on Applied Signal Processing</title><description>This paper presents a novel computationally efficient voice activity detection (VAD) algorithm and emphasizes the importance of such algorithms in distributed speech recognition (DSR) systems. When using VAD algorithms in telecommunication systems, the required capacity of the speech transmission channel can be reduced if only the speech parts of the signal are transmitted. A similar objective can be adopted in DSR systems, where the nonspeech parameters are not sent over the transmission channel. A novel approach is proposed for VAD decisions based on mel-filter bank (MFB) outputs with the so-called Hangover criterion. Comparative tests are presented between the presented MFB VAD algorithm and three VAD algorithms used in the G.729, G.723.1, and DSR (advanced front-end) Standards. These tests were made on the Aurora 2 database, with different signal-to-noise (SNRs) ratios. In the speech recognition tests, the proposed MFB VAD outperformed all the three VAD algorithms used in the standards by 14.19% relative (G.723.1 VAD), by 12.84% relative (G.729 VAD), and by 4.17% relative (DSR VAD) in all SNRs.</description><subject>Applied sciences</subject><subject>Detection, estimation, filtering, equalization, prediction</subject><subject>Exact sciences and technology</subject><subject>Information, signal and communications theory</subject><subject>Signal and communications theory</subject><subject>Signal processing</subject><subject>Signal, noise</subject><subject>Speech processing</subject><subject>Systems, networks and services of telecommunications</subject><subject>Telecommunications</subject><subject>Telecommunications and information theory</subject><subject>Transmission and modulation (techniques and equipments)</subject><issn>1110-8657</issn><issn>1687-6180</issn><issn>1687-6180</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2005</creationdate><recordtype>article</recordtype><recordid>eNpFkMtqwzAQRUVpoSHNrh-gTbuqU8m2Hl6a9AmBFvrYClkeJaLyo5KyyN_XIYGuZhjOvQwHoWtKlpQydl9_vC9zQtiylOIMzSiXIuNUkvNpp5RkkjNxiRYxuoYQTpmQQsxQW2MzdOMu6eSGXnu_x2CtMw76hDvwmXU-QcCN7n_wd_2Atd8MwaVth-0QcOtiCq7ZJWhxHAHMFgcww6Z3hzoc9zFBF6_QhdU-wuI05-jr6fFz9ZKt355fV_U6M0VRpqzSlogKNAVLeS6NbXJrbatLY6YDlE3FKyhKrStGc9MawQSvWqFtKRgRVhRzdHvsHcPwu4OYVOeiAe91D8MuqlzSSlKST-DdETRhiDGAVWNwnQ57RYk62FSTTXWwqSabE35z6tXRaG-D7o2L_xkup4c4L_4ASF926g</recordid><startdate>20050315</startdate><enddate>20050315</enddate><creator>VLAJ, Damjan</creator><creator>KOTNIK, Bojan</creator><creator>HORVAT, Bogomir</creator><creator>KACIC, Zdravko</creator><general>Hindawi Publishing Corporation</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20050315</creationdate><title>A computationally efficient mel-filter bank VAD algorithm for distributed speech recognition systems</title><author>VLAJ, Damjan ; KOTNIK, Bojan ; HORVAT, Bogomir ; KACIC, Zdravko</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c334t-9af079ea1ef1628cfb2fffda4ccf16e4b969e34aa9512cdc75769d7af47507f73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Applied sciences</topic><topic>Detection, estimation, filtering, equalization, prediction</topic><topic>Exact sciences and technology</topic><topic>Information, signal and communications theory</topic><topic>Signal and communications theory</topic><topic>Signal processing</topic><topic>Signal, noise</topic><topic>Speech processing</topic><topic>Systems, networks and services of telecommunications</topic><topic>Telecommunications</topic><topic>Telecommunications and information theory</topic><topic>Transmission and modulation (techniques and equipments)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>VLAJ, Damjan</creatorcontrib><creatorcontrib>KOTNIK, Bojan</creatorcontrib><creatorcontrib>HORVAT, Bogomir</creatorcontrib><creatorcontrib>KACIC, Zdravko</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>EURASIP Journal on Applied Signal Processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>VLAJ, Damjan</au><au>KOTNIK, Bojan</au><au>HORVAT, Bogomir</au><au>KACIC, Zdravko</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A computationally efficient mel-filter bank VAD algorithm for distributed speech recognition systems</atitle><jtitle>EURASIP Journal on Applied Signal Processing</jtitle><date>2005-03-15</date><risdate>2005</risdate><volume>2005</volume><issue>4</issue><spage>487</spage><epage>497</epage><pages>487-497</pages><artnum>561951</artnum><issn>1110-8657</issn><issn>1687-6180</issn><eissn>1687-6180</eissn><abstract>This paper presents a novel computationally efficient voice activity detection (VAD) algorithm and emphasizes the importance of such algorithms in distributed speech recognition (DSR) systems. When using VAD algorithms in telecommunication systems, the required capacity of the speech transmission channel can be reduced if only the speech parts of the signal are transmitted. A similar objective can be adopted in DSR systems, where the nonspeech parameters are not sent over the transmission channel. A novel approach is proposed for VAD decisions based on mel-filter bank (MFB) outputs with the so-called Hangover criterion. Comparative tests are presented between the presented MFB VAD algorithm and three VAD algorithms used in the G.729, G.723.1, and DSR (advanced front-end) Standards. These tests were made on the Aurora 2 database, with different signal-to-noise (SNRs) ratios. In the speech recognition tests, the proposed MFB VAD outperformed all the three VAD algorithms used in the standards by 14.19% relative (G.723.1 VAD), by 12.84% relative (G.729 VAD), and by 4.17% relative (DSR VAD) in all SNRs.</abstract><cop>New York, NY</cop><pub>Hindawi Publishing Corporation</pub><doi>10.1155/ASP.2005.487</doi><tpages>11</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1110-8657
ispartof EURASIP Journal on Applied Signal Processing, 2005-03, Vol.2005 (4), p.487-497, Article 561951
issn 1110-8657
1687-6180
1687-6180
language eng
recordid cdi_proquest_miscellaneous_28198102
source DOAJ Directory of Open Access Journals; SpringerLink Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; Springer Nature OA Free Journals
subjects Applied sciences
Detection, estimation, filtering, equalization, prediction
Exact sciences and technology
Information, signal and communications theory
Signal and communications theory
Signal processing
Signal, noise
Speech processing
Systems, networks and services of telecommunications
Telecommunications
Telecommunications and information theory
Transmission and modulation (techniques and equipments)
title A computationally efficient mel-filter bank VAD algorithm for distributed speech recognition systems
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-20T11%3A55%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20computationally%20efficient%20mel-filter%20bank%20VAD%20algorithm%20for%20distributed%20speech%20recognition%20systems&rft.jtitle=EURASIP%20Journal%20on%20Applied%20Signal%20Processing&rft.au=VLAJ,%20Damjan&rft.date=2005-03-15&rft.volume=2005&rft.issue=4&rft.spage=487&rft.epage=497&rft.pages=487-497&rft.artnum=561951&rft.issn=1110-8657&rft.eissn=1687-6180&rft_id=info:doi/10.1155/ASP.2005.487&rft_dat=%3Cproquest_cross%3E28198102%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=28198102&rft_id=info:pmid/&rfr_iscdi=true