Visual Speech Recognition Using Optical Flow and Hidden Markov Model

The present work proposes audio-visual speech recognition with the use of Gammatone frequency cepstral coefficient (GFCC) and optical flow (OF) features with Hindi speech database. The OF refers to the distribution of apparent velocities of brightness pattern movements in an image. In this technique...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Wireless personal communications 2019-06, Vol.106 (4), p.2129-2147
Hauptverfasser: Sharma, Usha, Maheshkar, Sushila, Mishra, A. N., Kaushik, Rahul
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 2147
container_issue 4
container_start_page 2129
container_title Wireless personal communications
container_volume 106
creator Sharma, Usha
Maheshkar, Sushila
Mishra, A. N.
Kaushik, Rahul
description The present work proposes audio-visual speech recognition with the use of Gammatone frequency cepstral coefficient (GFCC) and optical flow (OF) features with Hindi speech database. The OF refers to the distribution of apparent velocities of brightness pattern movements in an image. In this technique, OF is determined without extracting the location and contours of pair of lips of individual speaker. The visual features as horizontal component and vertical components of flow velocities have been calculated. Furthermore, the visual features are combined with audio features using early integration method followed by classification using hidden Markov model. The isolated Hindi digits were evaluated for their recognition performance using GFCC features not only in clean environment but also tested under noisy environment and compared with existing Mel frequency cepstral coefficient (MFCC) features. The GFCC shows almost comparable result with MFCC in clean environment; however, its performance goes down in noisy environment. Futhermore, the visual features obtained by the OF analysis when combine with GFCC audio features give significant improvement of ~ 12%, ~ 12%, and ~ 14% at different SNRs (5 dB, 10 dB, and 20 dB, respectively) in recognition performance under noisy environment.
doi_str_mv 10.1007/s11277-018-5930-z
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2227897735</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2227897735</sourcerecordid><originalsourceid>FETCH-LOGICAL-c316t-b6a59816978b3a69107f9a4a5c9f4451ce67f75468d0db9b08bebb4d79814333</originalsourceid><addsrcrecordid>eNp1kE9LwzAYh4MoOKcfwFvAczR_mqQ5ynRO2BjoFG8hadPZWZuadIr79GZU8OTpvTzP74UHgHOCLwnG8ioSQqVEmOSIK4bR7gCMCJcU5Sx7OQQjrKhCghJ6DE5i3GCcLEVH4Oa5jlvTwMfOueIVPrjCr9u6r30Ln2LdruGy6-siAdPGf0HTlnBWl6Vr4cKEN_8JF750zSk4qkwT3dnvHYPV9HY1maH58u5-cj1HBSOiR1YYrnIilMwtM0IRLCtlMsMLVWUZJ4UTspI8E3mJS6sszq2zNitlkjLG2BhcDLNd8B9bF3u98dvQpo-aUipzJSXjiSIDVQQfY3CV7kL9bsK3JljvW-mhlU6t9L6V3iWHDk5MbLt24W_5f-kHxp5rYw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2227897735</pqid></control><display><type>article</type><title>Visual Speech Recognition Using Optical Flow and Hidden Markov Model</title><source>SpringerLink Journals</source><creator>Sharma, Usha ; Maheshkar, Sushila ; Mishra, A. N. ; Kaushik, Rahul</creator><creatorcontrib>Sharma, Usha ; Maheshkar, Sushila ; Mishra, A. N. ; Kaushik, Rahul</creatorcontrib><description>The present work proposes audio-visual speech recognition with the use of Gammatone frequency cepstral coefficient (GFCC) and optical flow (OF) features with Hindi speech database. The OF refers to the distribution of apparent velocities of brightness pattern movements in an image. In this technique, OF is determined without extracting the location and contours of pair of lips of individual speaker. The visual features as horizontal component and vertical components of flow velocities have been calculated. Furthermore, the visual features are combined with audio features using early integration method followed by classification using hidden Markov model. The isolated Hindi digits were evaluated for their recognition performance using GFCC features not only in clean environment but also tested under noisy environment and compared with existing Mel frequency cepstral coefficient (MFCC) features. The GFCC shows almost comparable result with MFCC in clean environment; however, its performance goes down in noisy environment. Futhermore, the visual features obtained by the OF analysis when combine with GFCC audio features give significant improvement of ~ 12%, ~ 12%, and ~ 14% at different SNRs (5 dB, 10 dB, and 20 dB, respectively) in recognition performance under noisy environment.</description><identifier>ISSN: 0929-6212</identifier><identifier>EISSN: 1572-834X</identifier><identifier>DOI: 10.1007/s11277-018-5930-z</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Communications Engineering ; Computer Communication Networks ; Digits ; Engineering ; Feature recognition ; Markov analysis ; Markov chains ; Networks ; Optical flow (image analysis) ; Signal,Image and Speech Processing ; Speech recognition ; Voice recognition</subject><ispartof>Wireless personal communications, 2019-06, Vol.106 (4), p.2129-2147</ispartof><rights>Springer Science+Business Media, LLC, part of Springer Nature 2018</rights><rights>Copyright Springer Nature B.V. 2019</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c316t-b6a59816978b3a69107f9a4a5c9f4451ce67f75468d0db9b08bebb4d79814333</citedby><cites>FETCH-LOGICAL-c316t-b6a59816978b3a69107f9a4a5c9f4451ce67f75468d0db9b08bebb4d79814333</cites><orcidid>0000-0001-5481-7647</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11277-018-5930-z$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11277-018-5930-z$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27903,27904,41467,42536,51297</link.rule.ids></links><search><creatorcontrib>Sharma, Usha</creatorcontrib><creatorcontrib>Maheshkar, Sushila</creatorcontrib><creatorcontrib>Mishra, A. N.</creatorcontrib><creatorcontrib>Kaushik, Rahul</creatorcontrib><title>Visual Speech Recognition Using Optical Flow and Hidden Markov Model</title><title>Wireless personal communications</title><addtitle>Wireless Pers Commun</addtitle><description>The present work proposes audio-visual speech recognition with the use of Gammatone frequency cepstral coefficient (GFCC) and optical flow (OF) features with Hindi speech database. The OF refers to the distribution of apparent velocities of brightness pattern movements in an image. In this technique, OF is determined without extracting the location and contours of pair of lips of individual speaker. The visual features as horizontal component and vertical components of flow velocities have been calculated. Furthermore, the visual features are combined with audio features using early integration method followed by classification using hidden Markov model. The isolated Hindi digits were evaluated for their recognition performance using GFCC features not only in clean environment but also tested under noisy environment and compared with existing Mel frequency cepstral coefficient (MFCC) features. The GFCC shows almost comparable result with MFCC in clean environment; however, its performance goes down in noisy environment. Futhermore, the visual features obtained by the OF analysis when combine with GFCC audio features give significant improvement of ~ 12%, ~ 12%, and ~ 14% at different SNRs (5 dB, 10 dB, and 20 dB, respectively) in recognition performance under noisy environment.</description><subject>Communications Engineering</subject><subject>Computer Communication Networks</subject><subject>Digits</subject><subject>Engineering</subject><subject>Feature recognition</subject><subject>Markov analysis</subject><subject>Markov chains</subject><subject>Networks</subject><subject>Optical flow (image analysis)</subject><subject>Signal,Image and Speech Processing</subject><subject>Speech recognition</subject><subject>Voice recognition</subject><issn>0929-6212</issn><issn>1572-834X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNp1kE9LwzAYh4MoOKcfwFvAczR_mqQ5ynRO2BjoFG8hadPZWZuadIr79GZU8OTpvTzP74UHgHOCLwnG8ioSQqVEmOSIK4bR7gCMCJcU5Sx7OQQjrKhCghJ6DE5i3GCcLEVH4Oa5jlvTwMfOueIVPrjCr9u6r30Ln2LdruGy6-siAdPGf0HTlnBWl6Vr4cKEN_8JF750zSk4qkwT3dnvHYPV9HY1maH58u5-cj1HBSOiR1YYrnIilMwtM0IRLCtlMsMLVWUZJ4UTspI8E3mJS6sszq2zNitlkjLG2BhcDLNd8B9bF3u98dvQpo-aUipzJSXjiSIDVQQfY3CV7kL9bsK3JljvW-mhlU6t9L6V3iWHDk5MbLt24W_5f-kHxp5rYw</recordid><startdate>20190630</startdate><enddate>20190630</enddate><creator>Sharma, Usha</creator><creator>Maheshkar, Sushila</creator><creator>Mishra, A. N.</creator><creator>Kaushik, Rahul</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0001-5481-7647</orcidid></search><sort><creationdate>20190630</creationdate><title>Visual Speech Recognition Using Optical Flow and Hidden Markov Model</title><author>Sharma, Usha ; Maheshkar, Sushila ; Mishra, A. N. ; Kaushik, Rahul</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c316t-b6a59816978b3a69107f9a4a5c9f4451ce67f75468d0db9b08bebb4d79814333</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Communications Engineering</topic><topic>Computer Communication Networks</topic><topic>Digits</topic><topic>Engineering</topic><topic>Feature recognition</topic><topic>Markov analysis</topic><topic>Markov chains</topic><topic>Networks</topic><topic>Optical flow (image analysis)</topic><topic>Signal,Image and Speech Processing</topic><topic>Speech recognition</topic><topic>Voice recognition</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sharma, Usha</creatorcontrib><creatorcontrib>Maheshkar, Sushila</creatorcontrib><creatorcontrib>Mishra, A. N.</creatorcontrib><creatorcontrib>Kaushik, Rahul</creatorcontrib><collection>CrossRef</collection><jtitle>Wireless personal communications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sharma, Usha</au><au>Maheshkar, Sushila</au><au>Mishra, A. N.</au><au>Kaushik, Rahul</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Visual Speech Recognition Using Optical Flow and Hidden Markov Model</atitle><jtitle>Wireless personal communications</jtitle><stitle>Wireless Pers Commun</stitle><date>2019-06-30</date><risdate>2019</risdate><volume>106</volume><issue>4</issue><spage>2129</spage><epage>2147</epage><pages>2129-2147</pages><issn>0929-6212</issn><eissn>1572-834X</eissn><abstract>The present work proposes audio-visual speech recognition with the use of Gammatone frequency cepstral coefficient (GFCC) and optical flow (OF) features with Hindi speech database. The OF refers to the distribution of apparent velocities of brightness pattern movements in an image. In this technique, OF is determined without extracting the location and contours of pair of lips of individual speaker. The visual features as horizontal component and vertical components of flow velocities have been calculated. Furthermore, the visual features are combined with audio features using early integration method followed by classification using hidden Markov model. The isolated Hindi digits were evaluated for their recognition performance using GFCC features not only in clean environment but also tested under noisy environment and compared with existing Mel frequency cepstral coefficient (MFCC) features. The GFCC shows almost comparable result with MFCC in clean environment; however, its performance goes down in noisy environment. Futhermore, the visual features obtained by the OF analysis when combine with GFCC audio features give significant improvement of ~ 12%, ~ 12%, and ~ 14% at different SNRs (5 dB, 10 dB, and 20 dB, respectively) in recognition performance under noisy environment.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11277-018-5930-z</doi><tpages>19</tpages><orcidid>https://orcid.org/0000-0001-5481-7647</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0929-6212
ispartof Wireless personal communications, 2019-06, Vol.106 (4), p.2129-2147
issn 0929-6212
1572-834X
language eng
recordid cdi_proquest_journals_2227897735
source SpringerLink Journals
subjects Communications Engineering
Computer Communication Networks
Digits
Engineering
Feature recognition
Markov analysis
Markov chains
Networks
Optical flow (image analysis)
Signal,Image and Speech Processing
Speech recognition
Voice recognition
title Visual Speech Recognition Using Optical Flow and Hidden Markov Model
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-25T09%3A34%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Visual%20Speech%20Recognition%20Using%20Optical%20Flow%20and%20Hidden%20Markov%20Model&rft.jtitle=Wireless%20personal%20communications&rft.au=Sharma,%20Usha&rft.date=2019-06-30&rft.volume=106&rft.issue=4&rft.spage=2129&rft.epage=2147&rft.pages=2129-2147&rft.issn=0929-6212&rft.eissn=1572-834X&rft_id=info:doi/10.1007/s11277-018-5930-z&rft_dat=%3Cproquest_cross%3E2227897735%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2227897735&rft_id=info:pmid/&rfr_iscdi=true