Application of non-negative frequency-weighted energy operator for vowel region detection

In this paper, a novel technique has been proposed for the vowel region detection from the continuous speech using an envelope of the derivative of the speech signal, which is a non-negative, frequency-weighted energy operator. The proposed vowel region detection method is implemented using a two-st...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal of speech technology 2018-06, Vol.21 (2), p.279-291
Hauptverfasser:	Thirumuru, Ramakrishna, Vuppala, Anil Kumar
Format:	Artikel
Sprache:	eng
Schlagworte:	Acoustic noise Acoustic phonetics Algorithms Artificial Intelligence Continuous speech Cues Energy consumption Engineering False alarms Operators (mathematics) Signal analysis Signal,Image and Speech Processing Social Sciences Speech Speech disorders Voice recognition Vowels
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	291
container_issue	2
container_start_page	279
container_title	International journal of speech technology
container_volume	21
creator	Thirumuru, Ramakrishna Vuppala, Anil Kumar
description	In this paper, a novel technique has been proposed for the vowel region detection from the continuous speech using an envelope of the derivative of the speech signal, which is a non-negative, frequency-weighted energy operator. The proposed vowel region detection method is implemented using a two-stage algorithm. The first stage of vowel region detection consists of speech signal analysis to detect vowel onset points (VOP) and vowel end-points (VEP) using an instantaneous energy contour obtained from the envelope of the derivative of a speech signal. The VOPs and VEPs are spotted using the peak-finding algorithm based upon the first order Gaussian differentiator. The next stage consists of removal of spurious vowel regions and the correction of hypothesized VOP and VEP locations using combined cues obtained from the uniformity of epoch intervals and strength of the excitation of the speech signal. Performance of the proposed method for detecting vowel regions from the speech signal is evaluated using TIMIT acoustic-phonetic speech corpus. The proposed approach resulted in significantly high detection rate and less false alarm rate compared to the state-of-the-art methods in both clean and noisy environments.
doi_str_mv	10.1007/s10772-018-9505-x
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2038764088</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2038764088</sourcerecordid><originalsourceid>FETCH-LOGICAL-c382t-79c8423086b3ddf60f90cf773a160dea7906180fd267507794b7896422d529b3</originalsourceid><addsrcrecordid>eNp1UMlqwzAQFaWFpssH9GboWe1ItrUcQ-gGgV5y6Uk49sh1SCVXcra_r4wLPfUwzMJ7b2YeIXcMHhiAfIwMpOQUmKK6hJIez8iMlWmiGIPzVOeKUV4wcUmuYtwAgJaaz8jHvO-3XV0NnXeZt5nzjjpsU7_HzAb83qGrT_SAXfs5YJOhw9CeMt9jqAYfMpti7w-4zQK2o0aDA9aj2g25sNU24u1vviar56fV4pUu31_eFvMlrXPFByp1rQqegxLrvGmsAKuhtlLmFRPQYCU1CKbANlzIMv2oi7VUWhScNyXX6_ya3E-yffDp2DiYjd8FlzYaDrmSogClEopNqDr4GANa04fuqwonw8CMBprJQJMMNKOB5pg4fOLEhHUthj_l_0k_8t50BA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2038764088</pqid></control><display><type>article</type><title>Application of non-negative frequency-weighted energy operator for vowel region detection</title><source>SpringerLink_现刊</source><creator>Thirumuru, Ramakrishna ; Vuppala, Anil Kumar</creator><creatorcontrib>Thirumuru, Ramakrishna ; Vuppala, Anil Kumar</creatorcontrib><description>In this paper, a novel technique has been proposed for the vowel region detection from the continuous speech using an envelope of the derivative of the speech signal, which is a non-negative, frequency-weighted energy operator. The proposed vowel region detection method is implemented using a two-stage algorithm. The first stage of vowel region detection consists of speech signal analysis to detect vowel onset points (VOP) and vowel end-points (VEP) using an instantaneous energy contour obtained from the envelope of the derivative of a speech signal. The VOPs and VEPs are spotted using the peak-finding algorithm based upon the first order Gaussian differentiator. The next stage consists of removal of spurious vowel regions and the correction of hypothesized VOP and VEP locations using combined cues obtained from the uniformity of epoch intervals and strength of the excitation of the speech signal. Performance of the proposed method for detecting vowel regions from the speech signal is evaluated using TIMIT acoustic-phonetic speech corpus. The proposed approach resulted in significantly high detection rate and less false alarm rate compared to the state-of-the-art methods in both clean and noisy environments.</description><identifier>ISSN: 1381-2416</identifier><identifier>EISSN: 1572-8110</identifier><identifier>DOI: 10.1007/s10772-018-9505-x</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Acoustic noise ; Acoustic phonetics ; Algorithms ; Artificial Intelligence ; Continuous speech ; Cues ; Energy consumption ; Engineering ; False alarms ; Operators (mathematics) ; Signal analysis ; Signal,Image and Speech Processing ; Social Sciences ; Speech ; Speech disorders ; Voice recognition ; Vowels</subject><ispartof>International journal of speech technology, 2018-06, Vol.21 (2), p.279-291</ispartof><rights>Springer Science+Business Media, LLC, part of Springer Nature 2018</rights><rights>Copyright Springer Science & Business Media 2018</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c382t-79c8423086b3ddf60f90cf773a160dea7906180fd267507794b7896422d529b3</citedby><cites>FETCH-LOGICAL-c382t-79c8423086b3ddf60f90cf773a160dea7906180fd267507794b7896422d529b3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10772-018-9505-x$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10772-018-9505-x$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Thirumuru, Ramakrishna</creatorcontrib><creatorcontrib>Vuppala, Anil Kumar</creatorcontrib><title>Application of non-negative frequency-weighted energy operator for vowel region detection</title><title>International journal of speech technology</title><addtitle>Int J Speech Technol</addtitle><description>In this paper, a novel technique has been proposed for the vowel region detection from the continuous speech using an envelope of the derivative of the speech signal, which is a non-negative, frequency-weighted energy operator. The proposed vowel region detection method is implemented using a two-stage algorithm. The first stage of vowel region detection consists of speech signal analysis to detect vowel onset points (VOP) and vowel end-points (VEP) using an instantaneous energy contour obtained from the envelope of the derivative of a speech signal. The VOPs and VEPs are spotted using the peak-finding algorithm based upon the first order Gaussian differentiator. The next stage consists of removal of spurious vowel regions and the correction of hypothesized VOP and VEP locations using combined cues obtained from the uniformity of epoch intervals and strength of the excitation of the speech signal. Performance of the proposed method for detecting vowel regions from the speech signal is evaluated using TIMIT acoustic-phonetic speech corpus. The proposed approach resulted in significantly high detection rate and less false alarm rate compared to the state-of-the-art methods in both clean and noisy environments.</description><subject>Acoustic noise</subject><subject>Acoustic phonetics</subject><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Continuous speech</subject><subject>Cues</subject><subject>Energy consumption</subject><subject>Engineering</subject><subject>False alarms</subject><subject>Operators (mathematics)</subject><subject>Signal analysis</subject><subject>Signal,Image and Speech Processing</subject><subject>Social Sciences</subject><subject>Speech</subject><subject>Speech disorders</subject><subject>Voice recognition</subject><subject>Vowels</subject><issn>1381-2416</issn><issn>1572-8110</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><recordid>eNp1UMlqwzAQFaWFpssH9GboWe1ItrUcQ-gGgV5y6Uk49sh1SCVXcra_r4wLPfUwzMJ7b2YeIXcMHhiAfIwMpOQUmKK6hJIez8iMlWmiGIPzVOeKUV4wcUmuYtwAgJaaz8jHvO-3XV0NnXeZt5nzjjpsU7_HzAb83qGrT_SAXfs5YJOhw9CeMt9jqAYfMpti7w-4zQK2o0aDA9aj2g25sNU24u1vviar56fV4pUu31_eFvMlrXPFByp1rQqegxLrvGmsAKuhtlLmFRPQYCU1CKbANlzIMv2oi7VUWhScNyXX6_ya3E-yffDp2DiYjd8FlzYaDrmSogClEopNqDr4GANa04fuqwonw8CMBprJQJMMNKOB5pg4fOLEhHUthj_l_0k_8t50BA</recordid><startdate>20180601</startdate><enddate>20180601</enddate><creator>Thirumuru, Ramakrishna</creator><creator>Vuppala, Anil Kumar</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7T9</scope></search><sort><creationdate>20180601</creationdate><title>Application of non-negative frequency-weighted energy operator for vowel region detection</title><author>Thirumuru, Ramakrishna ; Vuppala, Anil Kumar</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c382t-79c8423086b3ddf60f90cf773a160dea7906180fd267507794b7896422d529b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Acoustic noise</topic><topic>Acoustic phonetics</topic><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Continuous speech</topic><topic>Cues</topic><topic>Energy consumption</topic><topic>Engineering</topic><topic>False alarms</topic><topic>Operators (mathematics)</topic><topic>Signal analysis</topic><topic>Signal,Image and Speech Processing</topic><topic>Social Sciences</topic><topic>Speech</topic><topic>Speech disorders</topic><topic>Voice recognition</topic><topic>Vowels</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Thirumuru, Ramakrishna</creatorcontrib><creatorcontrib>Vuppala, Anil Kumar</creatorcontrib><collection>CrossRef</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><jtitle>International journal of speech technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Thirumuru, Ramakrishna</au><au>Vuppala, Anil Kumar</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Application of non-negative frequency-weighted energy operator for vowel region detection</atitle><jtitle>International journal of speech technology</jtitle><stitle>Int J Speech Technol</stitle><date>2018-06-01</date><risdate>2018</risdate><volume>21</volume><issue>2</issue><spage>279</spage><epage>291</epage><pages>279-291</pages><issn>1381-2416</issn><eissn>1572-8110</eissn><abstract>In this paper, a novel technique has been proposed for the vowel region detection from the continuous speech using an envelope of the derivative of the speech signal, which is a non-negative, frequency-weighted energy operator. The proposed vowel region detection method is implemented using a two-stage algorithm. The first stage of vowel region detection consists of speech signal analysis to detect vowel onset points (VOP) and vowel end-points (VEP) using an instantaneous energy contour obtained from the envelope of the derivative of a speech signal. The VOPs and VEPs are spotted using the peak-finding algorithm based upon the first order Gaussian differentiator. The next stage consists of removal of spurious vowel regions and the correction of hypothesized VOP and VEP locations using combined cues obtained from the uniformity of epoch intervals and strength of the excitation of the speech signal. Performance of the proposed method for detecting vowel regions from the speech signal is evaluated using TIMIT acoustic-phonetic speech corpus. The proposed approach resulted in significantly high detection rate and less false alarm rate compared to the state-of-the-art methods in both clean and noisy environments.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10772-018-9505-x</doi><tpages>13</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 1381-2416
ispartof	International journal of speech technology, 2018-06, Vol.21 (2), p.279-291
issn	1381-2416 1572-8110
language	eng
recordid	cdi_proquest_journals_2038764088
source	SpringerLink_现刊
subjects	Acoustic noise Acoustic phonetics Algorithms Artificial Intelligence Continuous speech Cues Energy consumption Engineering False alarms Operators (mathematics) Signal analysis Signal,Image and Speech Processing Social Sciences Speech Speech disorders Voice recognition Vowels
title	Application of non-negative frequency-weighted energy operator for vowel region detection
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T07%3A15%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Application%20of%20non-negative%20frequency-weighted%20energy%20operator%20for%20vowel%20region%20detection&rft.jtitle=International%20journal%20of%20speech%20technology&rft.au=Thirumuru,%20Ramakrishna&rft.date=2018-06-01&rft.volume=21&rft.issue=2&rft.spage=279&rft.epage=291&rft.pages=279-291&rft.issn=1381-2416&rft.eissn=1572-8110&rft_id=info:doi/10.1007/s10772-018-9505-x&rft_dat=%3Cproquest_cross%3E2038764088%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2038764088&rft_id=info:pmid/&rfr_iscdi=true