Visual Speech Recognition Using Optical Flow and Hidden Markov Model
The present work proposes audio-visual speech recognition with the use of Gammatone frequency cepstral coefficient (GFCC) and optical flow (OF) features with Hindi speech database. The OF refers to the distribution of apparent velocities of brightness pattern movements in an image. In this technique...
Gespeichert in:
Veröffentlicht in: | Wireless personal communications 2019-06, Vol.106 (4), p.2129-2147 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 2147 |
---|---|
container_issue | 4 |
container_start_page | 2129 |
container_title | Wireless personal communications |
container_volume | 106 |
creator | Sharma, Usha Maheshkar, Sushila Mishra, A. N. Kaushik, Rahul |
description | The present work proposes audio-visual speech recognition with the use of Gammatone frequency cepstral coefficient (GFCC) and optical flow (OF) features with Hindi speech database. The OF refers to the distribution of apparent velocities of brightness pattern movements in an image. In this technique, OF is determined without extracting the location and contours of pair of lips of individual speaker. The visual features as horizontal component and vertical components of flow velocities have been calculated. Furthermore, the visual features are combined with audio features using early integration method followed by classification using hidden Markov model. The isolated Hindi digits were evaluated for their recognition performance using GFCC features not only in clean environment but also tested under noisy environment and compared with existing Mel frequency cepstral coefficient (MFCC) features. The GFCC shows almost comparable result with MFCC in clean environment; however, its performance goes down in noisy environment. Futhermore, the visual features obtained by the OF analysis when combine with GFCC audio features give significant improvement of ~ 12%, ~ 12%, and ~ 14% at different SNRs (5 dB, 10 dB, and 20 dB, respectively) in recognition performance under noisy environment. |
doi_str_mv | 10.1007/s11277-018-5930-z |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2227897735</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2227897735</sourcerecordid><originalsourceid>FETCH-LOGICAL-c316t-b6a59816978b3a69107f9a4a5c9f4451ce67f75468d0db9b08bebb4d79814333</originalsourceid><addsrcrecordid>eNp1kE9LwzAYh4MoOKcfwFvAczR_mqQ5ynRO2BjoFG8hadPZWZuadIr79GZU8OTpvTzP74UHgHOCLwnG8ioSQqVEmOSIK4bR7gCMCJcU5Sx7OQQjrKhCghJ6DE5i3GCcLEVH4Oa5jlvTwMfOueIVPrjCr9u6r30Ln2LdruGy6-siAdPGf0HTlnBWl6Vr4cKEN_8JF750zSk4qkwT3dnvHYPV9HY1maH58u5-cj1HBSOiR1YYrnIilMwtM0IRLCtlMsMLVWUZJ4UTspI8E3mJS6sszq2zNitlkjLG2BhcDLNd8B9bF3u98dvQpo-aUipzJSXjiSIDVQQfY3CV7kL9bsK3JljvW-mhlU6t9L6V3iWHDk5MbLt24W_5f-kHxp5rYw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2227897735</pqid></control><display><type>article</type><title>Visual Speech Recognition Using Optical Flow and Hidden Markov Model</title><source>SpringerLink Journals</source><creator>Sharma, Usha ; Maheshkar, Sushila ; Mishra, A. N. ; Kaushik, Rahul</creator><creatorcontrib>Sharma, Usha ; Maheshkar, Sushila ; Mishra, A. N. ; Kaushik, Rahul</creatorcontrib><description>The present work proposes audio-visual speech recognition with the use of Gammatone frequency cepstral coefficient (GFCC) and optical flow (OF) features with Hindi speech database. The OF refers to the distribution of apparent velocities of brightness pattern movements in an image. In this technique, OF is determined without extracting the location and contours of pair of lips of individual speaker. The visual features as horizontal component and vertical components of flow velocities have been calculated. Furthermore, the visual features are combined with audio features using early integration method followed by classification using hidden Markov model. The isolated Hindi digits were evaluated for their recognition performance using GFCC features not only in clean environment but also tested under noisy environment and compared with existing Mel frequency cepstral coefficient (MFCC) features. The GFCC shows almost comparable result with MFCC in clean environment; however, its performance goes down in noisy environment. Futhermore, the visual features obtained by the OF analysis when combine with GFCC audio features give significant improvement of ~ 12%, ~ 12%, and ~ 14% at different SNRs (5 dB, 10 dB, and 20 dB, respectively) in recognition performance under noisy environment.</description><identifier>ISSN: 0929-6212</identifier><identifier>EISSN: 1572-834X</identifier><identifier>DOI: 10.1007/s11277-018-5930-z</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Communications Engineering ; Computer Communication Networks ; Digits ; Engineering ; Feature recognition ; Markov analysis ; Markov chains ; Networks ; Optical flow (image analysis) ; Signal,Image and Speech Processing ; Speech recognition ; Voice recognition</subject><ispartof>Wireless personal communications, 2019-06, Vol.106 (4), p.2129-2147</ispartof><rights>Springer Science+Business Media, LLC, part of Springer Nature 2018</rights><rights>Copyright Springer Nature B.V. 2019</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c316t-b6a59816978b3a69107f9a4a5c9f4451ce67f75468d0db9b08bebb4d79814333</citedby><cites>FETCH-LOGICAL-c316t-b6a59816978b3a69107f9a4a5c9f4451ce67f75468d0db9b08bebb4d79814333</cites><orcidid>0000-0001-5481-7647</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11277-018-5930-z$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11277-018-5930-z$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27903,27904,41467,42536,51297</link.rule.ids></links><search><creatorcontrib>Sharma, Usha</creatorcontrib><creatorcontrib>Maheshkar, Sushila</creatorcontrib><creatorcontrib>Mishra, A. N.</creatorcontrib><creatorcontrib>Kaushik, Rahul</creatorcontrib><title>Visual Speech Recognition Using Optical Flow and Hidden Markov Model</title><title>Wireless personal communications</title><addtitle>Wireless Pers Commun</addtitle><description>The present work proposes audio-visual speech recognition with the use of Gammatone frequency cepstral coefficient (GFCC) and optical flow (OF) features with Hindi speech database. The OF refers to the distribution of apparent velocities of brightness pattern movements in an image. In this technique, OF is determined without extracting the location and contours of pair of lips of individual speaker. The visual features as horizontal component and vertical components of flow velocities have been calculated. Furthermore, the visual features are combined with audio features using early integration method followed by classification using hidden Markov model. The isolated Hindi digits were evaluated for their recognition performance using GFCC features not only in clean environment but also tested under noisy environment and compared with existing Mel frequency cepstral coefficient (MFCC) features. The GFCC shows almost comparable result with MFCC in clean environment; however, its performance goes down in noisy environment. Futhermore, the visual features obtained by the OF analysis when combine with GFCC audio features give significant improvement of ~ 12%, ~ 12%, and ~ 14% at different SNRs (5 dB, 10 dB, and 20 dB, respectively) in recognition performance under noisy environment.</description><subject>Communications Engineering</subject><subject>Computer Communication Networks</subject><subject>Digits</subject><subject>Engineering</subject><subject>Feature recognition</subject><subject>Markov analysis</subject><subject>Markov chains</subject><subject>Networks</subject><subject>Optical flow (image analysis)</subject><subject>Signal,Image and Speech Processing</subject><subject>Speech recognition</subject><subject>Voice recognition</subject><issn>0929-6212</issn><issn>1572-834X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNp1kE9LwzAYh4MoOKcfwFvAczR_mqQ5ynRO2BjoFG8hadPZWZuadIr79GZU8OTpvTzP74UHgHOCLwnG8ioSQqVEmOSIK4bR7gCMCJcU5Sx7OQQjrKhCghJ6DE5i3GCcLEVH4Oa5jlvTwMfOueIVPrjCr9u6r30Ln2LdruGy6-siAdPGf0HTlnBWl6Vr4cKEN_8JF750zSk4qkwT3dnvHYPV9HY1maH58u5-cj1HBSOiR1YYrnIilMwtM0IRLCtlMsMLVWUZJ4UTspI8E3mJS6sszq2zNitlkjLG2BhcDLNd8B9bF3u98dvQpo-aUipzJSXjiSIDVQQfY3CV7kL9bsK3JljvW-mhlU6t9L6V3iWHDk5MbLt24W_5f-kHxp5rYw</recordid><startdate>20190630</startdate><enddate>20190630</enddate><creator>Sharma, Usha</creator><creator>Maheshkar, Sushila</creator><creator>Mishra, A. N.</creator><creator>Kaushik, Rahul</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0001-5481-7647</orcidid></search><sort><creationdate>20190630</creationdate><title>Visual Speech Recognition Using Optical Flow and Hidden Markov Model</title><author>Sharma, Usha ; Maheshkar, Sushila ; Mishra, A. N. ; Kaushik, Rahul</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c316t-b6a59816978b3a69107f9a4a5c9f4451ce67f75468d0db9b08bebb4d79814333</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Communications Engineering</topic><topic>Computer Communication Networks</topic><topic>Digits</topic><topic>Engineering</topic><topic>Feature recognition</topic><topic>Markov analysis</topic><topic>Markov chains</topic><topic>Networks</topic><topic>Optical flow (image analysis)</topic><topic>Signal,Image and Speech Processing</topic><topic>Speech recognition</topic><topic>Voice recognition</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sharma, Usha</creatorcontrib><creatorcontrib>Maheshkar, Sushila</creatorcontrib><creatorcontrib>Mishra, A. N.</creatorcontrib><creatorcontrib>Kaushik, Rahul</creatorcontrib><collection>CrossRef</collection><jtitle>Wireless personal communications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sharma, Usha</au><au>Maheshkar, Sushila</au><au>Mishra, A. N.</au><au>Kaushik, Rahul</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Visual Speech Recognition Using Optical Flow and Hidden Markov Model</atitle><jtitle>Wireless personal communications</jtitle><stitle>Wireless Pers Commun</stitle><date>2019-06-30</date><risdate>2019</risdate><volume>106</volume><issue>4</issue><spage>2129</spage><epage>2147</epage><pages>2129-2147</pages><issn>0929-6212</issn><eissn>1572-834X</eissn><abstract>The present work proposes audio-visual speech recognition with the use of Gammatone frequency cepstral coefficient (GFCC) and optical flow (OF) features with Hindi speech database. The OF refers to the distribution of apparent velocities of brightness pattern movements in an image. In this technique, OF is determined without extracting the location and contours of pair of lips of individual speaker. The visual features as horizontal component and vertical components of flow velocities have been calculated. Furthermore, the visual features are combined with audio features using early integration method followed by classification using hidden Markov model. The isolated Hindi digits were evaluated for their recognition performance using GFCC features not only in clean environment but also tested under noisy environment and compared with existing Mel frequency cepstral coefficient (MFCC) features. The GFCC shows almost comparable result with MFCC in clean environment; however, its performance goes down in noisy environment. Futhermore, the visual features obtained by the OF analysis when combine with GFCC audio features give significant improvement of ~ 12%, ~ 12%, and ~ 14% at different SNRs (5 dB, 10 dB, and 20 dB, respectively) in recognition performance under noisy environment.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11277-018-5930-z</doi><tpages>19</tpages><orcidid>https://orcid.org/0000-0001-5481-7647</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0929-6212 |
ispartof | Wireless personal communications, 2019-06, Vol.106 (4), p.2129-2147 |
issn | 0929-6212 1572-834X |
language | eng |
recordid | cdi_proquest_journals_2227897735 |
source | SpringerLink Journals |
subjects | Communications Engineering Computer Communication Networks Digits Engineering Feature recognition Markov analysis Markov chains Networks Optical flow (image analysis) Signal,Image and Speech Processing Speech recognition Voice recognition |
title | Visual Speech Recognition Using Optical Flow and Hidden Markov Model |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-25T09%3A34%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Visual%20Speech%20Recognition%20Using%20Optical%20Flow%20and%20Hidden%20Markov%20Model&rft.jtitle=Wireless%20personal%20communications&rft.au=Sharma,%20Usha&rft.date=2019-06-30&rft.volume=106&rft.issue=4&rft.spage=2129&rft.epage=2147&rft.pages=2129-2147&rft.issn=0929-6212&rft.eissn=1572-834X&rft_id=info:doi/10.1007/s11277-018-5930-z&rft_dat=%3Cproquest_cross%3E2227897735%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2227897735&rft_id=info:pmid/&rfr_iscdi=true |