Mode Variational LSTM Robust to Unseen Modes of Variation: Application to Facial Expression Recognition

Spatio-temporal feature encoding is essential for encoding the dynamics in video sequences. Recurrent neural networks, particularly long short-term memory (LSTM) units, have been popular as an efficient tool for encoding spatio-temporal features in sequences. In this work, we investigate the effect...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Baddar, Wissam J, Ro, Yong Man
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Baddar, Wissam J Ro, Yong Man
description	Spatio-temporal feature encoding is essential for encoding the dynamics in video sequences. Recurrent neural networks, particularly long short-term memory (LSTM) units, have been popular as an efficient tool for encoding spatio-temporal features in sequences. In this work, we investigate the effect of mode variations on the encoded spatio-temporal features using LSTMs. We show that the LSTM retains information related to the mode variation in the sequence, which is irrelevant to the task at hand (e.g. classification facial expressions). Actually, the LSTM forget mechanism is not robust enough to mode variations and preserves information that could negatively affect the encoded spatio-temporal features. We propose the mode variational LSTM to encode spatio-temporal features robust to unseen modes of variation. The mode variational LSTM modifies the original LSTM structure by adding an additional cell state that focuses on encoding the mode variation in the input sequence. To efficiently regulate what features should be stored in the additional cell state, additional gating functionality is also introduced. The effectiveness of the proposed mode variational LSTM is verified using the facial expression recognition task. Comparative experiments on publicly available datasets verified that the proposed mode variational LSTM outperforms existing methods. Moreover, a new dynamic facial expression dataset with different modes of variation, including various modes like pose and illumination variations, was collected to comprehensively evaluate the proposed mode variational LSTM. Experimental results verified that the proposed mode variational LSTM encodes spatio-temporal features robust to unseen modes of variation.
doi_str_mv	10.48550/arxiv.1811.06937
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1811_06937</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1811_06937</sourcerecordid><originalsourceid>FETCH-LOGICAL-a677-a4871be426665af3150338a3c0495053882433aa3261834c03f8354f72d692f63</originalsourceid><addsrcrecordid>eNpFj9FKwzAUhnPjhUwfwCvzAq1JTpKmuxtjU6FDmHW35axLRqA2paky315TBa8O_-H7f_gIueMsl0Yp9oDjxX_m3HCeM11CcU3Ou3Cy9ICjx8mHHjtavdY7ug_HjzjRKdC3Plrb04RFGtw_uqSrYeh8O4dEbrH1P_3NZRhtjOm5t2049z4BN-TKYRft7d9dkHq7qddPWfXy-LxeVRnqoshQmoIfrRRaa4UOuGIABqFlslRMgTFCAiCC0NyAbBk4A0q6Qpx0KZyGBbn_nZ1Nm2H07zh-Ncm4mY3hG4o-T8M</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Mode Variational LSTM Robust to Unseen Modes of Variation: Application to Facial Expression Recognition</title><source>arXiv.org</source><creator>Baddar, Wissam J ; Ro, Yong Man</creator><creatorcontrib>Baddar, Wissam J ; Ro, Yong Man</creatorcontrib><description>Spatio-temporal feature encoding is essential for encoding the dynamics in video sequences. Recurrent neural networks, particularly long short-term memory (LSTM) units, have been popular as an efficient tool for encoding spatio-temporal features in sequences. In this work, we investigate the effect of mode variations on the encoded spatio-temporal features using LSTMs. We show that the LSTM retains information related to the mode variation in the sequence, which is irrelevant to the task at hand (e.g. classification facial expressions). Actually, the LSTM forget mechanism is not robust enough to mode variations and preserves information that could negatively affect the encoded spatio-temporal features. We propose the mode variational LSTM to encode spatio-temporal features robust to unseen modes of variation. The mode variational LSTM modifies the original LSTM structure by adding an additional cell state that focuses on encoding the mode variation in the input sequence. To efficiently regulate what features should be stored in the additional cell state, additional gating functionality is also introduced. The effectiveness of the proposed mode variational LSTM is verified using the facial expression recognition task. Comparative experiments on publicly available datasets verified that the proposed mode variational LSTM outperforms existing methods. Moreover, a new dynamic facial expression dataset with different modes of variation, including various modes like pose and illumination variations, was collected to comprehensively evaluate the proposed mode variational LSTM. Experimental results verified that the proposed mode variational LSTM encodes spatio-temporal features robust to unseen modes of variation.</description><identifier>DOI: 10.48550/arxiv.1811.06937</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Learning</subject><creationdate>2018-11</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,777,882</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1811.06937$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1811.06937$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Baddar, Wissam J</creatorcontrib><creatorcontrib>Ro, Yong Man</creatorcontrib><title>Mode Variational LSTM Robust to Unseen Modes of Variation: Application to Facial Expression Recognition</title><description>Spatio-temporal feature encoding is essential for encoding the dynamics in video sequences. Recurrent neural networks, particularly long short-term memory (LSTM) units, have been popular as an efficient tool for encoding spatio-temporal features in sequences. In this work, we investigate the effect of mode variations on the encoded spatio-temporal features using LSTMs. We show that the LSTM retains information related to the mode variation in the sequence, which is irrelevant to the task at hand (e.g. classification facial expressions). Actually, the LSTM forget mechanism is not robust enough to mode variations and preserves information that could negatively affect the encoded spatio-temporal features. We propose the mode variational LSTM to encode spatio-temporal features robust to unseen modes of variation. The mode variational LSTM modifies the original LSTM structure by adding an additional cell state that focuses on encoding the mode variation in the input sequence. To efficiently regulate what features should be stored in the additional cell state, additional gating functionality is also introduced. The effectiveness of the proposed mode variational LSTM is verified using the facial expression recognition task. Comparative experiments on publicly available datasets verified that the proposed mode variational LSTM outperforms existing methods. Moreover, a new dynamic facial expression dataset with different modes of variation, including various modes like pose and illumination variations, was collected to comprehensively evaluate the proposed mode variational LSTM. Experimental results verified that the proposed mode variational LSTM encodes spatio-temporal features robust to unseen modes of variation.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpFj9FKwzAUhnPjhUwfwCvzAq1JTpKmuxtjU6FDmHW35axLRqA2paky315TBa8O_-H7f_gIueMsl0Yp9oDjxX_m3HCeM11CcU3Ou3Cy9ICjx8mHHjtavdY7ug_HjzjRKdC3Plrb04RFGtw_uqSrYeh8O4dEbrH1P_3NZRhtjOm5t2049z4BN-TKYRft7d9dkHq7qddPWfXy-LxeVRnqoshQmoIfrRRaa4UOuGIABqFlslRMgTFCAiCC0NyAbBk4A0q6Qpx0KZyGBbn_nZ1Nm2H07zh-Ncm4mY3hG4o-T8M</recordid><startdate>20181116</startdate><enddate>20181116</enddate><creator>Baddar, Wissam J</creator><creator>Ro, Yong Man</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20181116</creationdate><title>Mode Variational LSTM Robust to Unseen Modes of Variation: Application to Facial Expression Recognition</title><author>Baddar, Wissam J ; Ro, Yong Man</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a677-a4871be426665af3150338a3c0495053882433aa3261834c03f8354f72d692f63</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Baddar, Wissam J</creatorcontrib><creatorcontrib>Ro, Yong Man</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Baddar, Wissam J</au><au>Ro, Yong Man</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Mode Variational LSTM Robust to Unseen Modes of Variation: Application to Facial Expression Recognition</atitle><date>2018-11-16</date><risdate>2018</risdate><abstract>Spatio-temporal feature encoding is essential for encoding the dynamics in video sequences. Recurrent neural networks, particularly long short-term memory (LSTM) units, have been popular as an efficient tool for encoding spatio-temporal features in sequences. In this work, we investigate the effect of mode variations on the encoded spatio-temporal features using LSTMs. We show that the LSTM retains information related to the mode variation in the sequence, which is irrelevant to the task at hand (e.g. classification facial expressions). Actually, the LSTM forget mechanism is not robust enough to mode variations and preserves information that could negatively affect the encoded spatio-temporal features. We propose the mode variational LSTM to encode spatio-temporal features robust to unseen modes of variation. The mode variational LSTM modifies the original LSTM structure by adding an additional cell state that focuses on encoding the mode variation in the input sequence. To efficiently regulate what features should be stored in the additional cell state, additional gating functionality is also introduced. The effectiveness of the proposed mode variational LSTM is verified using the facial expression recognition task. Comparative experiments on publicly available datasets verified that the proposed mode variational LSTM outperforms existing methods. Moreover, a new dynamic facial expression dataset with different modes of variation, including various modes like pose and illumination variations, was collected to comprehensively evaluate the proposed mode variational LSTM. Experimental results verified that the proposed mode variational LSTM encodes spatio-temporal features robust to unseen modes of variation.</abstract><doi>10.48550/arxiv.1811.06937</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.1811.06937
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_1811_06937
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition Computer Science - Learning
title	Mode Variational LSTM Robust to Unseen Modes of Variation: Application to Facial Expression Recognition
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T20%3A02%3A28IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Mode%20Variational%20LSTM%20Robust%20to%20Unseen%20Modes%20of%20Variation:%20Application%20to%20Facial%20Expression%20Recognition&rft.au=Baddar,%20Wissam%20J&rft.date=2018-11-16&rft_id=info:doi/10.48550/arxiv.1811.06937&rft_dat=%3Carxiv_GOX%3E1811_06937%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true