Voice Activity Detection Using Wavelet-Based Multiresolution Spectrum and Support Vector Machines and Audio Mixing Algorithm

This paper presents a Voice Activity Detection (VAD) algorithm and efficient speech mixing algorithm for a multimedia conference. The proposed VAD uses MFCC of multiresolution spectrum based on wavelets and two classical audio parameters as audio feature, and prejudges silence by detection of multi-...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Xue, Wei, Du, Sidan, Fang, Chengzhi, Ye, Yingxian
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Acoustics Applied sciences Artificial intelligence Computer science control theory systems Computer systems and distributed systems. User interface Data processing. List processing. Character string processing Exact sciences and technology Fundamental areas of phenomenology (including applications) Memory organisation. Data processing Pattern recognition. Digital image processing. Computational geometry Physics Software Transduction acoustical devices for the generation and reproduction of sound
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	88
container_issue
container_start_page	78
container_title
container_volume
creator	Xue, Wei Du, Sidan Fang, Chengzhi Ye, Yingxian
description	This paper presents a Voice Activity Detection (VAD) algorithm and efficient speech mixing algorithm for a multimedia conference. The proposed VAD uses MFCC of multiresolution spectrum based on wavelets and two classical audio parameters as audio feature, and prejudges silence by detection of multi-gate zero cross ratio, and classify noise and voice by Support Vector Machines (SVM). New speech mixing algorithm used in Multipoint Control Unit (MCU) of conferences imposes short-time power of each audio stream as mixing weight vector, and is designed for parallel processing in program. Various experiments show, proposed VAD algorithm achieves overall better performance in all SNRs than VAD of G.729b and other VAD, output audio of new speech mixing algorithm has excellent hearing perceptibility, and its computational time delay are small enough to satisfy the needs of real-time transmission, and MCU computation is lower than that based on G.729b VAD.
doi_str_mv	10.1007/11754336_8
format	Conference Proceeding
fullrecord	<record><control><sourceid>pascalfrancis_sprin</sourceid><recordid>TN_cdi_pascalfrancis_primary_19150861</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>19150861</sourcerecordid><originalsourceid>FETCH-LOGICAL-p218t-89e15a35b1d3dd63ae389144ba5c9092d77171bac5d105b49fc003f8abf0ade13</originalsourceid><addsrcrecordid>eNpFkEtPwzAQhM1LopRe-AW-IHEJeOM4iY-hPKVWHErLMXJspzWkcWQnFZX48aQUxF52tfNppBmELoBcAyHJDUDCIkrjPD1AZ5RFhEYhofEhGkAMEFAa8SM04kn6p4VwjAaEkjDgSURP0cj7d9IPhZiH6QB9LayRGmeyNRvTbvGdbnV_2xrPvamX-E1sdKXb4FZ4rfC0q1rjtLdV98PMmh523RqLWuFZ1zTWtXjR_6zDUyFXptb-R8s6ZSyems-dZ1YtrTPtan2OTkpReT363UM0f7h_HT8Fk5fH53E2CZoQ0jZIuQYmKCtAUaViKjRNOURRIZjkhIcqSSCBQkimgLAi4qXsA5apKEoilAY6RJd730Z4KarSiVoanzfOrIXb5sCBkTTecVd7zvdSvdQuL6z98DmQfNd-_t8-_QZfxnNz</addsrcrecordid><sourcetype>Index Database</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Voice Activity Detection Using Wavelet-Based Multiresolution Spectrum and Support Vector Machines and Audio Mixing Algorithm</title><source>Springer Books</source><creator>Xue, Wei ; Du, Sidan ; Fang, Chengzhi ; Ye, Yingxian</creator><contributor>Galata, Aphrodite ; Kisačanin, Branislav ; Lew, Michael S. ; Sebe, Nicu ; Kölsch, Mathias ; Huang, Thomas S. ; Pavlović, Vladimir</contributor><creatorcontrib>Xue, Wei ; Du, Sidan ; Fang, Chengzhi ; Ye, Yingxian ; Galata, Aphrodite ; Kisačanin, Branislav ; Lew, Michael S. ; Sebe, Nicu ; Kölsch, Mathias ; Huang, Thomas S. ; Pavlović, Vladimir</creatorcontrib><description>This paper presents a Voice Activity Detection (VAD) algorithm and efficient speech mixing algorithm for a multimedia conference. The proposed VAD uses MFCC of multiresolution spectrum based on wavelets and two classical audio parameters as audio feature, and prejudges silence by detection of multi-gate zero cross ratio, and classify noise and voice by Support Vector Machines (SVM). New speech mixing algorithm used in Multipoint Control Unit (MCU) of conferences imposes short-time power of each audio stream as mixing weight vector, and is designed for parallel processing in program. Various experiments show, proposed VAD algorithm achieves overall better performance in all SNRs than VAD of G.729b and other VAD, output audio of new speech mixing algorithm has excellent hearing perceptibility, and its computational time delay are small enough to satisfy the needs of real-time transmission, and MCU computation is lower than that based on G.729b VAD.</description><identifier>ISSN: 0302-9743</identifier><identifier>ISBN: 9783540342021</identifier><identifier>ISBN: 3540342028</identifier><identifier>EISSN: 1611-3349</identifier><identifier>EISBN: 3540342036</identifier><identifier>EISBN: 9783540342038</identifier><identifier>DOI: 10.1007/11754336_8</identifier><language>eng</language><publisher>Berlin, Heidelberg: Springer Berlin Heidelberg</publisher><subject>Acoustics ; Applied sciences ; Artificial intelligence ; Computer science; control theory; systems ; Computer systems and distributed systems. User interface ; Data processing. List processing. Character string processing ; Exact sciences and technology ; Fundamental areas of phenomenology (including applications) ; Memory organisation. Data processing ; Pattern recognition. Digital image processing. Computational geometry ; Physics ; Software ; Transduction; acoustical devices for the generation and reproduction of sound</subject><ispartof>Computer Vision in Human-Computer Interaction, 2006, p.78-88</ispartof><rights>Springer-Verlag Berlin Heidelberg 2006</rights><rights>2007 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/11754336_8$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/11754336_8$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>309,310,775,776,780,785,786,789,4036,4037,27902,38232,41418,42487</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=19150861$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><contributor>Galata, Aphrodite</contributor><contributor>Kisačanin, Branislav</contributor><contributor>Lew, Michael S.</contributor><contributor>Sebe, Nicu</contributor><contributor>Kölsch, Mathias</contributor><contributor>Huang, Thomas S.</contributor><contributor>Pavlović, Vladimir</contributor><creatorcontrib>Xue, Wei</creatorcontrib><creatorcontrib>Du, Sidan</creatorcontrib><creatorcontrib>Fang, Chengzhi</creatorcontrib><creatorcontrib>Ye, Yingxian</creatorcontrib><title>Voice Activity Detection Using Wavelet-Based Multiresolution Spectrum and Support Vector Machines and Audio Mixing Algorithm</title><title>Computer Vision in Human-Computer Interaction</title><description>This paper presents a Voice Activity Detection (VAD) algorithm and efficient speech mixing algorithm for a multimedia conference. The proposed VAD uses MFCC of multiresolution spectrum based on wavelets and two classical audio parameters as audio feature, and prejudges silence by detection of multi-gate zero cross ratio, and classify noise and voice by Support Vector Machines (SVM). New speech mixing algorithm used in Multipoint Control Unit (MCU) of conferences imposes short-time power of each audio stream as mixing weight vector, and is designed for parallel processing in program. Various experiments show, proposed VAD algorithm achieves overall better performance in all SNRs than VAD of G.729b and other VAD, output audio of new speech mixing algorithm has excellent hearing perceptibility, and its computational time delay are small enough to satisfy the needs of real-time transmission, and MCU computation is lower than that based on G.729b VAD.</description><subject>Acoustics</subject><subject>Applied sciences</subject><subject>Artificial intelligence</subject><subject>Computer science; control theory; systems</subject><subject>Computer systems and distributed systems. User interface</subject><subject>Data processing. List processing. Character string processing</subject><subject>Exact sciences and technology</subject><subject>Fundamental areas of phenomenology (including applications)</subject><subject>Memory organisation. Data processing</subject><subject>Pattern recognition. Digital image processing. Computational geometry</subject><subject>Physics</subject><subject>Software</subject><subject>Transduction; acoustical devices for the generation and reproduction of sound</subject><issn>0302-9743</issn><issn>1611-3349</issn><isbn>9783540342021</isbn><isbn>3540342028</isbn><isbn>3540342036</isbn><isbn>9783540342038</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2006</creationdate><recordtype>conference_proceeding</recordtype><recordid>eNpFkEtPwzAQhM1LopRe-AW-IHEJeOM4iY-hPKVWHErLMXJspzWkcWQnFZX48aQUxF52tfNppBmELoBcAyHJDUDCIkrjPD1AZ5RFhEYhofEhGkAMEFAa8SM04kn6p4VwjAaEkjDgSURP0cj7d9IPhZiH6QB9LayRGmeyNRvTbvGdbnV_2xrPvamX-E1sdKXb4FZ4rfC0q1rjtLdV98PMmh523RqLWuFZ1zTWtXjR_6zDUyFXptb-R8s6ZSyems-dZ1YtrTPtan2OTkpReT363UM0f7h_HT8Fk5fH53E2CZoQ0jZIuQYmKCtAUaViKjRNOURRIZjkhIcqSSCBQkimgLAi4qXsA5apKEoilAY6RJd730Z4KarSiVoanzfOrIXb5sCBkTTecVd7zvdSvdQuL6z98DmQfNd-_t8-_QZfxnNz</recordid><startdate>2006</startdate><enddate>2006</enddate><creator>Xue, Wei</creator><creator>Du, Sidan</creator><creator>Fang, Chengzhi</creator><creator>Ye, Yingxian</creator><general>Springer Berlin Heidelberg</general><general>Springer</general><scope>IQODW</scope></search><sort><creationdate>2006</creationdate><title>Voice Activity Detection Using Wavelet-Based Multiresolution Spectrum and Support Vector Machines and Audio Mixing Algorithm</title><author>Xue, Wei ; Du, Sidan ; Fang, Chengzhi ; Ye, Yingxian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-p218t-89e15a35b1d3dd63ae389144ba5c9092d77171bac5d105b49fc003f8abf0ade13</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2006</creationdate><topic>Acoustics</topic><topic>Applied sciences</topic><topic>Artificial intelligence</topic><topic>Computer science; control theory; systems</topic><topic>Computer systems and distributed systems. User interface</topic><topic>Data processing. List processing. Character string processing</topic><topic>Exact sciences and technology</topic><topic>Fundamental areas of phenomenology (including applications)</topic><topic>Memory organisation. Data processing</topic><topic>Pattern recognition. Digital image processing. Computational geometry</topic><topic>Physics</topic><topic>Software</topic><topic>Transduction; acoustical devices for the generation and reproduction of sound</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Xue, Wei</creatorcontrib><creatorcontrib>Du, Sidan</creatorcontrib><creatorcontrib>Fang, Chengzhi</creatorcontrib><creatorcontrib>Ye, Yingxian</creatorcontrib><collection>Pascal-Francis</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Xue, Wei</au><au>Du, Sidan</au><au>Fang, Chengzhi</au><au>Ye, Yingxian</au><au>Galata, Aphrodite</au><au>Kisačanin, Branislav</au><au>Lew, Michael S.</au><au>Sebe, Nicu</au><au>Kölsch, Mathias</au><au>Huang, Thomas S.</au><au>Pavlović, Vladimir</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Voice Activity Detection Using Wavelet-Based Multiresolution Spectrum and Support Vector Machines and Audio Mixing Algorithm</atitle><btitle>Computer Vision in Human-Computer Interaction</btitle><date>2006</date><risdate>2006</risdate><spage>78</spage><epage>88</epage><pages>78-88</pages><issn>0302-9743</issn><eissn>1611-3349</eissn><isbn>9783540342021</isbn><isbn>3540342028</isbn><eisbn>3540342036</eisbn><eisbn>9783540342038</eisbn><abstract>This paper presents a Voice Activity Detection (VAD) algorithm and efficient speech mixing algorithm for a multimedia conference. The proposed VAD uses MFCC of multiresolution spectrum based on wavelets and two classical audio parameters as audio feature, and prejudges silence by detection of multi-gate zero cross ratio, and classify noise and voice by Support Vector Machines (SVM). New speech mixing algorithm used in Multipoint Control Unit (MCU) of conferences imposes short-time power of each audio stream as mixing weight vector, and is designed for parallel processing in program. Various experiments show, proposed VAD algorithm achieves overall better performance in all SNRs than VAD of G.729b and other VAD, output audio of new speech mixing algorithm has excellent hearing perceptibility, and its computational time delay are small enough to satisfy the needs of real-time transmission, and MCU computation is lower than that based on G.729b VAD.</abstract><cop>Berlin, Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/11754336_8</doi><tpages>11</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0302-9743
ispartof	Computer Vision in Human-Computer Interaction, 2006, p.78-88
issn	0302-9743 1611-3349
language	eng
recordid	cdi_pascalfrancis_primary_19150861
source	Springer Books
subjects	Acoustics Applied sciences Artificial intelligence Computer science control theory systems Computer systems and distributed systems. User interface Data processing. List processing. Character string processing Exact sciences and technology Fundamental areas of phenomenology (including applications) Memory organisation. Data processing Pattern recognition. Digital image processing. Computational geometry Physics Software Transduction acoustical devices for the generation and reproduction of sound
title	Voice Activity Detection Using Wavelet-Based Multiresolution Spectrum and Support Vector Machines and Audio Mixing Algorithm
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T13%3A59%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-pascalfrancis_sprin&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Voice%20Activity%20Detection%20Using%20Wavelet-Based%20Multiresolution%20Spectrum%20and%20Support%20Vector%20Machines%20and%20Audio%20Mixing%20Algorithm&rft.btitle=Computer%20Vision%20in%20Human-Computer%20Interaction&rft.au=Xue,%20Wei&rft.date=2006&rft.spage=78&rft.epage=88&rft.pages=78-88&rft.issn=0302-9743&rft.eissn=1611-3349&rft.isbn=9783540342021&rft.isbn_list=3540342028&rft_id=info:doi/10.1007/11754336_8&rft_dat=%3Cpascalfrancis_sprin%3E19150861%3C/pascalfrancis_sprin%3E%3Curl%3E%3C/url%3E&rft.eisbn=3540342036&rft.eisbn_list=9783540342038&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true