Improved vowel region detection from a continuous speech using post processing of vowel onset points and vowel end-points
Vowels are produced with an open configuration of the vocal tract, without any audible friction. The acoustic signal is relatively loud with varying strength of impulse-like excitation. Vowels possess significant energy content in the low-frequency bands of the speech signal. Acoustic events such as...
Gespeichert in:
Veröffentlicht in: | Multimedia tools and applications 2018-02, Vol.77 (4), p.4753-4767 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 4767 |
---|---|
container_issue | 4 |
container_start_page | 4753 |
container_title | Multimedia tools and applications |
container_volume | 77 |
creator | Thirumuru, Ramakrishna Gangashetty, Suryakanth V. Vuppala, Anil Kumar |
description | Vowels are produced with an open configuration of the vocal tract, without any audible friction. The acoustic signal is relatively loud with varying strength of impulse-like excitation. Vowels possess significant energy content in the low-frequency bands of the speech signal. Acoustic events such as vowel onset point (VOP) and vowel end-point (VEP) can be used as landmarks to detect vowel regions in a speech signal. In this paper, a two-stage algorithm is proposed to detect precise vowel regions. In the first level, the speech signal is processed using zero frequency filtering to emphasize energy content in low-frequency bands of speech. Zero frequency filtered signal predominantly contains low-frequency content of the speech signal as it is filtered around 0 Hz. This process is followed by the extraction of dominant spectral peaks from the magnitude spectrum around glottal closure regions of the speech signal. The vowel onset points and vowel end-points are obtained by convolving the enhanced spectral contour of zero frequency filtered signal with first order Gaussian differentiator. In the next level, a post-processing is carried out in the regions around VOP and VEP to remove spurious vowel regions based on uniformity of epoch intervals. In addition, the positions of VOPs and VEPs are also corrected using the strength of the excitation of the speech signal. The performance of the proposed vowel region detection method is compared with the existing state of art methods on TIMIT acoustic-phonetic speech corpus. It is reported that this method produced significant improvement in vowel region detection in clean and noisy environments. |
doi_str_mv | 10.1007/s11042-017-5044-8 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2002610234</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2002610234</sourcerecordid><originalsourceid>FETCH-LOGICAL-c316t-436bb8432d4248c986608f7f5bf77b24f35596eef1532a143e58c8361d56768a3</originalsourceid><addsrcrecordid>eNp1kEtLAzEUhQdRsFZ_gLuA62jeSZdSfBQKbnQdZjI3dUqb1GSm0n9v6iiuXOXk5pzvhlNV15TcUkL0XaaUCIYJ1VgSIbA5qSZUao61ZvS0aG4I1pLQ8-oi5zUhVEkmJtVhsd2luIcW7eMnbFCCVRcDaqEH1x-VT3GLauRi6LswxCGjvANw72jIXVihXcw9KgQH-fse_Q8ohgzlJXahz6gOv3wILR6Hl9WZrzcZrn7OafX2-PA6f8bLl6fF_H6JHaeqx4KrpjGCs1YwYdzMKEWM1142XuuGCc-lnCkATyVnNRUcpHGGK9pKpZWp-bS6Gbnllx8D5N6u45BCWWkZIUxRwrgoLjq6XIo5J_B2l7ptnQ6WEnts2I4N29KwPTZsTcmwMZOLN6wg_ZH_D30B_SV_XQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2002610234</pqid></control><display><type>article</type><title>Improved vowel region detection from a continuous speech using post processing of vowel onset points and vowel end-points</title><source>SpringerLink Journals - AutoHoldings</source><creator>Thirumuru, Ramakrishna ; Gangashetty, Suryakanth V. ; Vuppala, Anil Kumar</creator><creatorcontrib>Thirumuru, Ramakrishna ; Gangashetty, Suryakanth V. ; Vuppala, Anil Kumar</creatorcontrib><description>Vowels are produced with an open configuration of the vocal tract, without any audible friction. The acoustic signal is relatively loud with varying strength of impulse-like excitation. Vowels possess significant energy content in the low-frequency bands of the speech signal. Acoustic events such as vowel onset point (VOP) and vowel end-point (VEP) can be used as landmarks to detect vowel regions in a speech signal. In this paper, a two-stage algorithm is proposed to detect precise vowel regions. In the first level, the speech signal is processed using zero frequency filtering to emphasize energy content in low-frequency bands of speech. Zero frequency filtered signal predominantly contains low-frequency content of the speech signal as it is filtered around 0 Hz. This process is followed by the extraction of dominant spectral peaks from the magnitude spectrum around glottal closure regions of the speech signal. The vowel onset points and vowel end-points are obtained by convolving the enhanced spectral contour of zero frequency filtered signal with first order Gaussian differentiator. In the next level, a post-processing is carried out in the regions around VOP and VEP to remove spurious vowel regions based on uniformity of epoch intervals. In addition, the positions of VOPs and VEPs are also corrected using the strength of the excitation of the speech signal. The performance of the proposed vowel region detection method is compared with the existing state of art methods on TIMIT acoustic-phonetic speech corpus. It is reported that this method produced significant improvement in vowel region detection in clean and noisy environments.</description><identifier>ISSN: 1380-7501</identifier><identifier>EISSN: 1573-7721</identifier><identifier>DOI: 10.1007/s11042-017-5044-8</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Acoustic noise ; Acoustics ; Computer Communication Networks ; Computer Science ; Data Structures and Information Theory ; Excitation ; Filtration ; Frequencies ; Gaussian process ; Multimedia Information Systems ; Post-production processing ; Signal processing ; Special Purpose and Application-Based Systems ; Speech ; Speech disorders ; Verbal learning ; Vocal tract ; Voice recognition ; Vowels</subject><ispartof>Multimedia tools and applications, 2018-02, Vol.77 (4), p.4753-4767</ispartof><rights>Springer Science+Business Media, LLC 2017</rights><rights>Multimedia Tools and Applications is a copyright of Springer, (2017). All Rights Reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c316t-436bb8432d4248c986608f7f5bf77b24f35596eef1532a143e58c8361d56768a3</citedby><cites>FETCH-LOGICAL-c316t-436bb8432d4248c986608f7f5bf77b24f35596eef1532a143e58c8361d56768a3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11042-017-5044-8$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11042-017-5044-8$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27903,27904,41467,42536,51297</link.rule.ids></links><search><creatorcontrib>Thirumuru, Ramakrishna</creatorcontrib><creatorcontrib>Gangashetty, Suryakanth V.</creatorcontrib><creatorcontrib>Vuppala, Anil Kumar</creatorcontrib><title>Improved vowel region detection from a continuous speech using post processing of vowel onset points and vowel end-points</title><title>Multimedia tools and applications</title><addtitle>Multimed Tools Appl</addtitle><description>Vowels are produced with an open configuration of the vocal tract, without any audible friction. The acoustic signal is relatively loud with varying strength of impulse-like excitation. Vowels possess significant energy content in the low-frequency bands of the speech signal. Acoustic events such as vowel onset point (VOP) and vowel end-point (VEP) can be used as landmarks to detect vowel regions in a speech signal. In this paper, a two-stage algorithm is proposed to detect precise vowel regions. In the first level, the speech signal is processed using zero frequency filtering to emphasize energy content in low-frequency bands of speech. Zero frequency filtered signal predominantly contains low-frequency content of the speech signal as it is filtered around 0 Hz. This process is followed by the extraction of dominant spectral peaks from the magnitude spectrum around glottal closure regions of the speech signal. The vowel onset points and vowel end-points are obtained by convolving the enhanced spectral contour of zero frequency filtered signal with first order Gaussian differentiator. In the next level, a post-processing is carried out in the regions around VOP and VEP to remove spurious vowel regions based on uniformity of epoch intervals. In addition, the positions of VOPs and VEPs are also corrected using the strength of the excitation of the speech signal. The performance of the proposed vowel region detection method is compared with the existing state of art methods on TIMIT acoustic-phonetic speech corpus. It is reported that this method produced significant improvement in vowel region detection in clean and noisy environments.</description><subject>Acoustic noise</subject><subject>Acoustics</subject><subject>Computer Communication Networks</subject><subject>Computer Science</subject><subject>Data Structures and Information Theory</subject><subject>Excitation</subject><subject>Filtration</subject><subject>Frequencies</subject><subject>Gaussian process</subject><subject>Multimedia Information Systems</subject><subject>Post-production processing</subject><subject>Signal processing</subject><subject>Special Purpose and Application-Based Systems</subject><subject>Speech</subject><subject>Speech disorders</subject><subject>Verbal learning</subject><subject>Vocal tract</subject><subject>Voice recognition</subject><subject>Vowels</subject><issn>1380-7501</issn><issn>1573-7721</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>8G5</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNp1kEtLAzEUhQdRsFZ_gLuA62jeSZdSfBQKbnQdZjI3dUqb1GSm0n9v6iiuXOXk5pzvhlNV15TcUkL0XaaUCIYJ1VgSIbA5qSZUao61ZvS0aG4I1pLQ8-oi5zUhVEkmJtVhsd2luIcW7eMnbFCCVRcDaqEH1x-VT3GLauRi6LswxCGjvANw72jIXVihXcw9KgQH-fse_Q8ohgzlJXahz6gOv3wILR6Hl9WZrzcZrn7OafX2-PA6f8bLl6fF_H6JHaeqx4KrpjGCs1YwYdzMKEWM1142XuuGCc-lnCkATyVnNRUcpHGGK9pKpZWp-bS6Gbnllx8D5N6u45BCWWkZIUxRwrgoLjq6XIo5J_B2l7ptnQ6WEnts2I4N29KwPTZsTcmwMZOLN6wg_ZH_D30B_SV_XQ</recordid><startdate>20180201</startdate><enddate>20180201</enddate><creator>Thirumuru, Ramakrishna</creator><creator>Gangashetty, Suryakanth V.</creator><creator>Vuppala, Anil Kumar</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>M2O</scope><scope>MBDVC</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope></search><sort><creationdate>20180201</creationdate><title>Improved vowel region detection from a continuous speech using post processing of vowel onset points and vowel end-points</title><author>Thirumuru, Ramakrishna ; Gangashetty, Suryakanth V. ; Vuppala, Anil Kumar</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c316t-436bb8432d4248c986608f7f5bf77b24f35596eef1532a143e58c8361d56768a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Acoustic noise</topic><topic>Acoustics</topic><topic>Computer Communication Networks</topic><topic>Computer Science</topic><topic>Data Structures and Information Theory</topic><topic>Excitation</topic><topic>Filtration</topic><topic>Frequencies</topic><topic>Gaussian process</topic><topic>Multimedia Information Systems</topic><topic>Post-production processing</topic><topic>Signal processing</topic><topic>Special Purpose and Application-Based Systems</topic><topic>Speech</topic><topic>Speech disorders</topic><topic>Verbal learning</topic><topic>Vocal tract</topic><topic>Voice recognition</topic><topic>Vowels</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Thirumuru, Ramakrishna</creatorcontrib><creatorcontrib>Gangashetty, Suryakanth V.</creatorcontrib><creatorcontrib>Vuppala, Anil Kumar</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Research Library</collection><collection>Research Library (Corporate)</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><jtitle>Multimedia tools and applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Thirumuru, Ramakrishna</au><au>Gangashetty, Suryakanth V.</au><au>Vuppala, Anil Kumar</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Improved vowel region detection from a continuous speech using post processing of vowel onset points and vowel end-points</atitle><jtitle>Multimedia tools and applications</jtitle><stitle>Multimed Tools Appl</stitle><date>2018-02-01</date><risdate>2018</risdate><volume>77</volume><issue>4</issue><spage>4753</spage><epage>4767</epage><pages>4753-4767</pages><issn>1380-7501</issn><eissn>1573-7721</eissn><abstract>Vowels are produced with an open configuration of the vocal tract, without any audible friction. The acoustic signal is relatively loud with varying strength of impulse-like excitation. Vowels possess significant energy content in the low-frequency bands of the speech signal. Acoustic events such as vowel onset point (VOP) and vowel end-point (VEP) can be used as landmarks to detect vowel regions in a speech signal. In this paper, a two-stage algorithm is proposed to detect precise vowel regions. In the first level, the speech signal is processed using zero frequency filtering to emphasize energy content in low-frequency bands of speech. Zero frequency filtered signal predominantly contains low-frequency content of the speech signal as it is filtered around 0 Hz. This process is followed by the extraction of dominant spectral peaks from the magnitude spectrum around glottal closure regions of the speech signal. The vowel onset points and vowel end-points are obtained by convolving the enhanced spectral contour of zero frequency filtered signal with first order Gaussian differentiator. In the next level, a post-processing is carried out in the regions around VOP and VEP to remove spurious vowel regions based on uniformity of epoch intervals. In addition, the positions of VOPs and VEPs are also corrected using the strength of the excitation of the speech signal. The performance of the proposed vowel region detection method is compared with the existing state of art methods on TIMIT acoustic-phonetic speech corpus. It is reported that this method produced significant improvement in vowel region detection in clean and noisy environments.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11042-017-5044-8</doi><tpages>15</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1380-7501 |
ispartof | Multimedia tools and applications, 2018-02, Vol.77 (4), p.4753-4767 |
issn | 1380-7501 1573-7721 |
language | eng |
recordid | cdi_proquest_journals_2002610234 |
source | SpringerLink Journals - AutoHoldings |
subjects | Acoustic noise Acoustics Computer Communication Networks Computer Science Data Structures and Information Theory Excitation Filtration Frequencies Gaussian process Multimedia Information Systems Post-production processing Signal processing Special Purpose and Application-Based Systems Speech Speech disorders Verbal learning Vocal tract Voice recognition Vowels |
title | Improved vowel region detection from a continuous speech using post processing of vowel onset points and vowel end-points |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T03%3A48%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Improved%20vowel%20region%20detection%20from%20a%20continuous%20speech%20using%20post%20processing%20of%20vowel%20onset%20points%20and%20vowel%20end-points&rft.jtitle=Multimedia%20tools%20and%20applications&rft.au=Thirumuru,%20Ramakrishna&rft.date=2018-02-01&rft.volume=77&rft.issue=4&rft.spage=4753&rft.epage=4767&rft.pages=4753-4767&rft.issn=1380-7501&rft.eissn=1573-7721&rft_id=info:doi/10.1007/s11042-017-5044-8&rft_dat=%3Cproquest_cross%3E2002610234%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2002610234&rft_id=info:pmid/&rfr_iscdi=true |