FEATURE-FUSION BASED AUDIO-VISUAL SPEECH RECOGNITION USING LIP GEOMETRY FEATURES IN NOISY ENVIROMENT

Humans are often able to compensate for noise degradation and uncertainty in speech information by augmenting the received audio with visual information. Such bimodal perception generates a rich combination of information that can be used in the recognition of speech. However, due to wide variabilit...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:ARPN journal of engineering and applied sciences 2015-12, Vol.10 (23), p.17521-17527
Hauptverfasser: Ibrahim, M Z, Mulvaney, D J, Abas, M F
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 17527
container_issue 23
container_start_page 17521
container_title ARPN journal of engineering and applied sciences
container_volume 10
creator Ibrahim, M Z
Mulvaney, D J
Abas, M F
description Humans are often able to compensate for noise degradation and uncertainty in speech information by augmenting the received audio with visual information. Such bimodal perception generates a rich combination of information that can be used in the recognition of speech. However, due to wide variability in the lip movement involved in articulation, not all speech can be substantially improved by audio-visual integration. This paper describes a feature-fusion audio-visual speech recognition (AVSR) system that extracts lip geometry from the mouth region using a combination of skin color filter, border following and convex hull, and classification using a Hidden Markov Model. The comparison of the new approach with conventional audio-only system is made when operating under simulated ambient noise conditions that affect the spoken phrases. The experimental results demonstrate that, in the presence of audio noise, the audio-visual approach significantly improves speech recognition accuracy compared with audio-only approach.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_miscellaneous_1808118929</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1808118929</sourcerecordid><originalsourceid>FETCH-LOGICAL-p103t-6b7fcf7a908d9b142aa3d99bdb1b5d1c57417aa2a39aa70da25850fb90d1cadf3</originalsourceid><addsrcrecordid>eNpNjEFrwyAAhWVssNL1P3jcRdCYRD1mqU2FTEtMCj0VjQlsZGs32_-_jPWwd3kPvo93BxaEE4HyHPP7f_sRrGJ8x3NSkTJOFyBsZNF2jUSbziqj4Uth5RoW3VoZtFe2K2pod1KWW9jI0lRatb_W7OoK1moHK2leZdsc4O3HQqWhNsoeoNR71cxUt0_gYXRTHFa3XoJuI9tyi2pTqbKo0ZlgekG5Z2M_MicwD8KTNHGOBiF88MRngfQZSwlzLnFUOMdwcEnGMzx6gWfowkiX4Pnv9_x9-roO8XL8eIv9ME3uczhd45FwzAnhIhH0B-tYTbQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1808118929</pqid></control><display><type>article</type><title>FEATURE-FUSION BASED AUDIO-VISUAL SPEECH RECOGNITION USING LIP GEOMETRY FEATURES IN NOISY ENVIROMENT</title><source>EZB Electronic Journals Library</source><creator>Ibrahim, M Z ; Mulvaney, D J ; Abas, M F</creator><creatorcontrib>Ibrahim, M Z ; Mulvaney, D J ; Abas, M F</creatorcontrib><description>Humans are often able to compensate for noise degradation and uncertainty in speech information by augmenting the received audio with visual information. Such bimodal perception generates a rich combination of information that can be used in the recognition of speech. However, due to wide variability in the lip movement involved in articulation, not all speech can be substantially improved by audio-visual integration. This paper describes a feature-fusion audio-visual speech recognition (AVSR) system that extracts lip geometry from the mouth region using a combination of skin color filter, border following and convex hull, and classification using a Hidden Markov Model. The comparison of the new approach with conventional audio-only system is made when operating under simulated ambient noise conditions that affect the spoken phrases. The experimental results demonstrate that, in the presence of audio noise, the audio-visual approach significantly improves speech recognition accuracy compared with audio-only approach.</description><identifier>ISSN: 1819-6608</identifier><identifier>EISSN: 1819-6608</identifier><language>eng</language><subject>Audiovisual ; Mathematical models ; Mouth ; Noise ; Perception ; Sound filters ; Speech ; Speech recognition</subject><ispartof>ARPN journal of engineering and applied sciences, 2015-12, Vol.10 (23), p.17521-17527</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784</link.rule.ids></links><search><creatorcontrib>Ibrahim, M Z</creatorcontrib><creatorcontrib>Mulvaney, D J</creatorcontrib><creatorcontrib>Abas, M F</creatorcontrib><title>FEATURE-FUSION BASED AUDIO-VISUAL SPEECH RECOGNITION USING LIP GEOMETRY FEATURES IN NOISY ENVIROMENT</title><title>ARPN journal of engineering and applied sciences</title><description>Humans are often able to compensate for noise degradation and uncertainty in speech information by augmenting the received audio with visual information. Such bimodal perception generates a rich combination of information that can be used in the recognition of speech. However, due to wide variability in the lip movement involved in articulation, not all speech can be substantially improved by audio-visual integration. This paper describes a feature-fusion audio-visual speech recognition (AVSR) system that extracts lip geometry from the mouth region using a combination of skin color filter, border following and convex hull, and classification using a Hidden Markov Model. The comparison of the new approach with conventional audio-only system is made when operating under simulated ambient noise conditions that affect the spoken phrases. The experimental results demonstrate that, in the presence of audio noise, the audio-visual approach significantly improves speech recognition accuracy compared with audio-only approach.</description><subject>Audiovisual</subject><subject>Mathematical models</subject><subject>Mouth</subject><subject>Noise</subject><subject>Perception</subject><subject>Sound filters</subject><subject>Speech</subject><subject>Speech recognition</subject><issn>1819-6608</issn><issn>1819-6608</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><recordid>eNpNjEFrwyAAhWVssNL1P3jcRdCYRD1mqU2FTEtMCj0VjQlsZGs32_-_jPWwd3kPvo93BxaEE4HyHPP7f_sRrGJ8x3NSkTJOFyBsZNF2jUSbziqj4Uth5RoW3VoZtFe2K2pod1KWW9jI0lRatb_W7OoK1moHK2leZdsc4O3HQqWhNsoeoNR71cxUt0_gYXRTHFa3XoJuI9tyi2pTqbKo0ZlgekG5Z2M_MicwD8KTNHGOBiF88MRngfQZSwlzLnFUOMdwcEnGMzx6gWfowkiX4Pnv9_x9-roO8XL8eIv9ME3uczhd45FwzAnhIhH0B-tYTbQ</recordid><startdate>20151201</startdate><enddate>20151201</enddate><creator>Ibrahim, M Z</creator><creator>Mulvaney, D J</creator><creator>Abas, M F</creator><scope>7SC</scope><scope>7SP</scope><scope>7TB</scope><scope>7U5</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20151201</creationdate><title>FEATURE-FUSION BASED AUDIO-VISUAL SPEECH RECOGNITION USING LIP GEOMETRY FEATURES IN NOISY ENVIROMENT</title><author>Ibrahim, M Z ; Mulvaney, D J ; Abas, M F</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-p103t-6b7fcf7a908d9b142aa3d99bdb1b5d1c57417aa2a39aa70da25850fb90d1cadf3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Audiovisual</topic><topic>Mathematical models</topic><topic>Mouth</topic><topic>Noise</topic><topic>Perception</topic><topic>Sound filters</topic><topic>Speech</topic><topic>Speech recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Ibrahim, M Z</creatorcontrib><creatorcontrib>Mulvaney, D J</creatorcontrib><creatorcontrib>Abas, M F</creatorcontrib><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>ARPN journal of engineering and applied sciences</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ibrahim, M Z</au><au>Mulvaney, D J</au><au>Abas, M F</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>FEATURE-FUSION BASED AUDIO-VISUAL SPEECH RECOGNITION USING LIP GEOMETRY FEATURES IN NOISY ENVIROMENT</atitle><jtitle>ARPN journal of engineering and applied sciences</jtitle><date>2015-12-01</date><risdate>2015</risdate><volume>10</volume><issue>23</issue><spage>17521</spage><epage>17527</epage><pages>17521-17527</pages><issn>1819-6608</issn><eissn>1819-6608</eissn><abstract>Humans are often able to compensate for noise degradation and uncertainty in speech information by augmenting the received audio with visual information. Such bimodal perception generates a rich combination of information that can be used in the recognition of speech. However, due to wide variability in the lip movement involved in articulation, not all speech can be substantially improved by audio-visual integration. This paper describes a feature-fusion audio-visual speech recognition (AVSR) system that extracts lip geometry from the mouth region using a combination of skin color filter, border following and convex hull, and classification using a Hidden Markov Model. The comparison of the new approach with conventional audio-only system is made when operating under simulated ambient noise conditions that affect the spoken phrases. The experimental results demonstrate that, in the presence of audio noise, the audio-visual approach significantly improves speech recognition accuracy compared with audio-only approach.</abstract><tpages>7</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1819-6608
ispartof ARPN journal of engineering and applied sciences, 2015-12, Vol.10 (23), p.17521-17527
issn 1819-6608
1819-6608
language eng
recordid cdi_proquest_miscellaneous_1808118929
source EZB Electronic Journals Library
subjects Audiovisual
Mathematical models
Mouth
Noise
Perception
Sound filters
Speech
Speech recognition
title FEATURE-FUSION BASED AUDIO-VISUAL SPEECH RECOGNITION USING LIP GEOMETRY FEATURES IN NOISY ENVIROMENT
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T11%3A04%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=FEATURE-FUSION%20BASED%20AUDIO-VISUAL%20SPEECH%20RECOGNITION%20USING%20LIP%20GEOMETRY%20FEATURES%20IN%20NOISY%20ENVIROMENT&rft.jtitle=ARPN%20journal%20of%20engineering%20and%20applied%20sciences&rft.au=Ibrahim,%20M%20Z&rft.date=2015-12-01&rft.volume=10&rft.issue=23&rft.spage=17521&rft.epage=17527&rft.pages=17521-17527&rft.issn=1819-6608&rft.eissn=1819-6608&rft_id=info:doi/&rft_dat=%3Cproquest%3E1808118929%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1808118929&rft_id=info:pmid/&rfr_iscdi=true