A computationally efficient mel-filter bank VAD algorithm for distributed speech recognition systems
This paper presents a novel computationally efficient voice activity detection (VAD) algorithm and emphasizes the importance of such algorithms in distributed speech recognition (DSR) systems. When using VAD algorithms in telecommunication systems, the required capacity of the speech transmission ch...
Gespeichert in:
Veröffentlicht in: | EURASIP Journal on Applied Signal Processing 2005-03, Vol.2005 (4), p.487-497, Article 561951 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 497 |
---|---|
container_issue | 4 |
container_start_page | 487 |
container_title | EURASIP Journal on Applied Signal Processing |
container_volume | 2005 |
creator | VLAJ, Damjan KOTNIK, Bojan HORVAT, Bogomir KACIC, Zdravko |
description | This paper presents a novel computationally efficient voice activity detection (VAD) algorithm and emphasizes the importance of such algorithms in distributed speech recognition (DSR) systems. When using VAD algorithms in telecommunication systems, the required capacity of the speech transmission channel can be reduced if only the speech parts of the signal are transmitted. A similar objective can be adopted in DSR systems, where the nonspeech parameters are not sent over the transmission channel. A novel approach is proposed for VAD decisions based on mel-filter bank (MFB) outputs with the so-called Hangover criterion. Comparative tests are presented between the presented MFB VAD algorithm and three VAD algorithms used in the G.729, G.723.1, and DSR (advanced front-end) Standards. These tests were made on the Aurora 2 database, with different signal-to-noise (SNRs) ratios. In the speech recognition tests, the proposed MFB VAD outperformed all the three VAD algorithms used in the standards by 14.19% relative (G.723.1 VAD), by 12.84% relative (G.729 VAD), and by 4.17% relative (DSR VAD) in all SNRs. |
doi_str_mv | 10.1155/ASP.2005.487 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_28198102</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>28198102</sourcerecordid><originalsourceid>FETCH-LOGICAL-c334t-9af079ea1ef1628cfb2fffda4ccf16e4b969e34aa9512cdc75769d7af47507f73</originalsourceid><addsrcrecordid>eNpFkMtqwzAQRUVpoSHNrh-gTbuqU8m2Hl6a9AmBFvrYClkeJaLyo5KyyN_XIYGuZhjOvQwHoWtKlpQydl9_vC9zQtiylOIMzSiXIuNUkvNpp5RkkjNxiRYxuoYQTpmQQsxQW2MzdOMu6eSGXnu_x2CtMw76hDvwmXU-QcCN7n_wd_2Atd8MwaVth-0QcOtiCq7ZJWhxHAHMFgcww6Z3hzoc9zFBF6_QhdU-wuI05-jr6fFz9ZKt355fV_U6M0VRpqzSlogKNAVLeS6NbXJrbatLY6YDlE3FKyhKrStGc9MawQSvWqFtKRgRVhRzdHvsHcPwu4OYVOeiAe91D8MuqlzSSlKST-DdETRhiDGAVWNwnQ57RYk62FSTTXWwqSabE35z6tXRaG-D7o2L_xkup4c4L_4ASF926g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>28198102</pqid></control><display><type>article</type><title>A computationally efficient mel-filter bank VAD algorithm for distributed speech recognition systems</title><source>DOAJ Directory of Open Access Journals</source><source>SpringerLink Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>Springer Nature OA Free Journals</source><creator>VLAJ, Damjan ; KOTNIK, Bojan ; HORVAT, Bogomir ; KACIC, Zdravko</creator><creatorcontrib>VLAJ, Damjan ; KOTNIK, Bojan ; HORVAT, Bogomir ; KACIC, Zdravko</creatorcontrib><description>This paper presents a novel computationally efficient voice activity detection (VAD) algorithm and emphasizes the importance of such algorithms in distributed speech recognition (DSR) systems. When using VAD algorithms in telecommunication systems, the required capacity of the speech transmission channel can be reduced if only the speech parts of the signal are transmitted. A similar objective can be adopted in DSR systems, where the nonspeech parameters are not sent over the transmission channel. A novel approach is proposed for VAD decisions based on mel-filter bank (MFB) outputs with the so-called Hangover criterion. Comparative tests are presented between the presented MFB VAD algorithm and three VAD algorithms used in the G.729, G.723.1, and DSR (advanced front-end) Standards. These tests were made on the Aurora 2 database, with different signal-to-noise (SNRs) ratios. In the speech recognition tests, the proposed MFB VAD outperformed all the three VAD algorithms used in the standards by 14.19% relative (G.723.1 VAD), by 12.84% relative (G.729 VAD), and by 4.17% relative (DSR VAD) in all SNRs.</description><identifier>ISSN: 1110-8657</identifier><identifier>ISSN: 1687-6180</identifier><identifier>EISSN: 1687-6180</identifier><identifier>DOI: 10.1155/ASP.2005.487</identifier><language>eng</language><publisher>New York, NY: Hindawi Publishing Corporation</publisher><subject>Applied sciences ; Detection, estimation, filtering, equalization, prediction ; Exact sciences and technology ; Information, signal and communications theory ; Signal and communications theory ; Signal processing ; Signal, noise ; Speech processing ; Systems, networks and services of telecommunications ; Telecommunications ; Telecommunications and information theory ; Transmission and modulation (techniques and equipments)</subject><ispartof>EURASIP Journal on Applied Signal Processing, 2005-03, Vol.2005 (4), p.487-497, Article 561951</ispartof><rights>2005 INIST-CNRS</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c334t-9af079ea1ef1628cfb2fffda4ccf16e4b969e34aa9512cdc75769d7af47507f73</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,860,27901,27902</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=16851266$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>VLAJ, Damjan</creatorcontrib><creatorcontrib>KOTNIK, Bojan</creatorcontrib><creatorcontrib>HORVAT, Bogomir</creatorcontrib><creatorcontrib>KACIC, Zdravko</creatorcontrib><title>A computationally efficient mel-filter bank VAD algorithm for distributed speech recognition systems</title><title>EURASIP Journal on Applied Signal Processing</title><description>This paper presents a novel computationally efficient voice activity detection (VAD) algorithm and emphasizes the importance of such algorithms in distributed speech recognition (DSR) systems. When using VAD algorithms in telecommunication systems, the required capacity of the speech transmission channel can be reduced if only the speech parts of the signal are transmitted. A similar objective can be adopted in DSR systems, where the nonspeech parameters are not sent over the transmission channel. A novel approach is proposed for VAD decisions based on mel-filter bank (MFB) outputs with the so-called Hangover criterion. Comparative tests are presented between the presented MFB VAD algorithm and three VAD algorithms used in the G.729, G.723.1, and DSR (advanced front-end) Standards. These tests were made on the Aurora 2 database, with different signal-to-noise (SNRs) ratios. In the speech recognition tests, the proposed MFB VAD outperformed all the three VAD algorithms used in the standards by 14.19% relative (G.723.1 VAD), by 12.84% relative (G.729 VAD), and by 4.17% relative (DSR VAD) in all SNRs.</description><subject>Applied sciences</subject><subject>Detection, estimation, filtering, equalization, prediction</subject><subject>Exact sciences and technology</subject><subject>Information, signal and communications theory</subject><subject>Signal and communications theory</subject><subject>Signal processing</subject><subject>Signal, noise</subject><subject>Speech processing</subject><subject>Systems, networks and services of telecommunications</subject><subject>Telecommunications</subject><subject>Telecommunications and information theory</subject><subject>Transmission and modulation (techniques and equipments)</subject><issn>1110-8657</issn><issn>1687-6180</issn><issn>1687-6180</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2005</creationdate><recordtype>article</recordtype><recordid>eNpFkMtqwzAQRUVpoSHNrh-gTbuqU8m2Hl6a9AmBFvrYClkeJaLyo5KyyN_XIYGuZhjOvQwHoWtKlpQydl9_vC9zQtiylOIMzSiXIuNUkvNpp5RkkjNxiRYxuoYQTpmQQsxQW2MzdOMu6eSGXnu_x2CtMw76hDvwmXU-QcCN7n_wd_2Atd8MwaVth-0QcOtiCq7ZJWhxHAHMFgcww6Z3hzoc9zFBF6_QhdU-wuI05-jr6fFz9ZKt355fV_U6M0VRpqzSlogKNAVLeS6NbXJrbatLY6YDlE3FKyhKrStGc9MawQSvWqFtKRgRVhRzdHvsHcPwu4OYVOeiAe91D8MuqlzSSlKST-DdETRhiDGAVWNwnQ57RYk62FSTTXWwqSabE35z6tXRaG-D7o2L_xkup4c4L_4ASF926g</recordid><startdate>20050315</startdate><enddate>20050315</enddate><creator>VLAJ, Damjan</creator><creator>KOTNIK, Bojan</creator><creator>HORVAT, Bogomir</creator><creator>KACIC, Zdravko</creator><general>Hindawi Publishing Corporation</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20050315</creationdate><title>A computationally efficient mel-filter bank VAD algorithm for distributed speech recognition systems</title><author>VLAJ, Damjan ; KOTNIK, Bojan ; HORVAT, Bogomir ; KACIC, Zdravko</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c334t-9af079ea1ef1628cfb2fffda4ccf16e4b969e34aa9512cdc75769d7af47507f73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Applied sciences</topic><topic>Detection, estimation, filtering, equalization, prediction</topic><topic>Exact sciences and technology</topic><topic>Information, signal and communications theory</topic><topic>Signal and communications theory</topic><topic>Signal processing</topic><topic>Signal, noise</topic><topic>Speech processing</topic><topic>Systems, networks and services of telecommunications</topic><topic>Telecommunications</topic><topic>Telecommunications and information theory</topic><topic>Transmission and modulation (techniques and equipments)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>VLAJ, Damjan</creatorcontrib><creatorcontrib>KOTNIK, Bojan</creatorcontrib><creatorcontrib>HORVAT, Bogomir</creatorcontrib><creatorcontrib>KACIC, Zdravko</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>EURASIP Journal on Applied Signal Processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>VLAJ, Damjan</au><au>KOTNIK, Bojan</au><au>HORVAT, Bogomir</au><au>KACIC, Zdravko</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A computationally efficient mel-filter bank VAD algorithm for distributed speech recognition systems</atitle><jtitle>EURASIP Journal on Applied Signal Processing</jtitle><date>2005-03-15</date><risdate>2005</risdate><volume>2005</volume><issue>4</issue><spage>487</spage><epage>497</epage><pages>487-497</pages><artnum>561951</artnum><issn>1110-8657</issn><issn>1687-6180</issn><eissn>1687-6180</eissn><abstract>This paper presents a novel computationally efficient voice activity detection (VAD) algorithm and emphasizes the importance of such algorithms in distributed speech recognition (DSR) systems. When using VAD algorithms in telecommunication systems, the required capacity of the speech transmission channel can be reduced if only the speech parts of the signal are transmitted. A similar objective can be adopted in DSR systems, where the nonspeech parameters are not sent over the transmission channel. A novel approach is proposed for VAD decisions based on mel-filter bank (MFB) outputs with the so-called Hangover criterion. Comparative tests are presented between the presented MFB VAD algorithm and three VAD algorithms used in the G.729, G.723.1, and DSR (advanced front-end) Standards. These tests were made on the Aurora 2 database, with different signal-to-noise (SNRs) ratios. In the speech recognition tests, the proposed MFB VAD outperformed all the three VAD algorithms used in the standards by 14.19% relative (G.723.1 VAD), by 12.84% relative (G.729 VAD), and by 4.17% relative (DSR VAD) in all SNRs.</abstract><cop>New York, NY</cop><pub>Hindawi Publishing Corporation</pub><doi>10.1155/ASP.2005.487</doi><tpages>11</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1110-8657 |
ispartof | EURASIP Journal on Applied Signal Processing, 2005-03, Vol.2005 (4), p.487-497, Article 561951 |
issn | 1110-8657 1687-6180 1687-6180 |
language | eng |
recordid | cdi_proquest_miscellaneous_28198102 |
source | DOAJ Directory of Open Access Journals; SpringerLink Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; Springer Nature OA Free Journals |
subjects | Applied sciences Detection, estimation, filtering, equalization, prediction Exact sciences and technology Information, signal and communications theory Signal and communications theory Signal processing Signal, noise Speech processing Systems, networks and services of telecommunications Telecommunications Telecommunications and information theory Transmission and modulation (techniques and equipments) |
title | A computationally efficient mel-filter bank VAD algorithm for distributed speech recognition systems |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-20T11%3A55%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20computationally%20efficient%20mel-filter%20bank%20VAD%20algorithm%20for%20distributed%20speech%20recognition%20systems&rft.jtitle=EURASIP%20Journal%20on%20Applied%20Signal%20Processing&rft.au=VLAJ,%20Damjan&rft.date=2005-03-15&rft.volume=2005&rft.issue=4&rft.spage=487&rft.epage=497&rft.pages=487-497&rft.artnum=561951&rft.issn=1110-8657&rft.eissn=1687-6180&rft_id=info:doi/10.1155/ASP.2005.487&rft_dat=%3Cproquest_cross%3E28198102%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=28198102&rft_id=info:pmid/&rfr_iscdi=true |