Mode Variational LSTM Robust to Unseen Modes of Variation: Application to Facial Expression Recognition
Spatio-temporal feature encoding is essential for encoding the dynamics in video sequences. Recurrent neural networks, particularly long short-term memory (LSTM) units, have been popular as an efficient tool for encoding spatio-temporal features in sequences. In this work, we investigate the effect...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Baddar, Wissam J Ro, Yong Man |
description | Spatio-temporal feature encoding is essential for encoding the dynamics in
video sequences. Recurrent neural networks, particularly long short-term memory
(LSTM) units, have been popular as an efficient tool for encoding
spatio-temporal features in sequences. In this work, we investigate the effect
of mode variations on the encoded spatio-temporal features using LSTMs. We show
that the LSTM retains information related to the mode variation in the
sequence, which is irrelevant to the task at hand (e.g. classification facial
expressions). Actually, the LSTM forget mechanism is not robust enough to mode
variations and preserves information that could negatively affect the encoded
spatio-temporal features. We propose the mode variational LSTM to encode
spatio-temporal features robust to unseen modes of variation. The mode
variational LSTM modifies the original LSTM structure by adding an additional
cell state that focuses on encoding the mode variation in the input sequence.
To efficiently regulate what features should be stored in the additional cell
state, additional gating functionality is also introduced. The effectiveness of
the proposed mode variational LSTM is verified using the facial expression
recognition task. Comparative experiments on publicly available datasets
verified that the proposed mode variational LSTM outperforms existing methods.
Moreover, a new dynamic facial expression dataset with different modes of
variation, including various modes like pose and illumination variations, was
collected to comprehensively evaluate the proposed mode variational LSTM.
Experimental results verified that the proposed mode variational LSTM encodes
spatio-temporal features robust to unseen modes of variation. |
doi_str_mv | 10.48550/arxiv.1811.06937 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1811_06937</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1811_06937</sourcerecordid><originalsourceid>FETCH-LOGICAL-a677-a4871be426665af3150338a3c0495053882433aa3261834c03f8354f72d692f63</originalsourceid><addsrcrecordid>eNpFj9FKwzAUhnPjhUwfwCvzAq1JTpKmuxtjU6FDmHW35axLRqA2paky315TBa8O_-H7f_gIueMsl0Yp9oDjxX_m3HCeM11CcU3Ou3Cy9ICjx8mHHjtavdY7ug_HjzjRKdC3Plrb04RFGtw_uqSrYeh8O4dEbrH1P_3NZRhtjOm5t2049z4BN-TKYRft7d9dkHq7qddPWfXy-LxeVRnqoshQmoIfrRRaa4UOuGIABqFlslRMgTFCAiCC0NyAbBk4A0q6Qpx0KZyGBbn_nZ1Nm2H07zh-Ncm4mY3hG4o-T8M</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Mode Variational LSTM Robust to Unseen Modes of Variation: Application to Facial Expression Recognition</title><source>arXiv.org</source><creator>Baddar, Wissam J ; Ro, Yong Man</creator><creatorcontrib>Baddar, Wissam J ; Ro, Yong Man</creatorcontrib><description>Spatio-temporal feature encoding is essential for encoding the dynamics in
video sequences. Recurrent neural networks, particularly long short-term memory
(LSTM) units, have been popular as an efficient tool for encoding
spatio-temporal features in sequences. In this work, we investigate the effect
of mode variations on the encoded spatio-temporal features using LSTMs. We show
that the LSTM retains information related to the mode variation in the
sequence, which is irrelevant to the task at hand (e.g. classification facial
expressions). Actually, the LSTM forget mechanism is not robust enough to mode
variations and preserves information that could negatively affect the encoded
spatio-temporal features. We propose the mode variational LSTM to encode
spatio-temporal features robust to unseen modes of variation. The mode
variational LSTM modifies the original LSTM structure by adding an additional
cell state that focuses on encoding the mode variation in the input sequence.
To efficiently regulate what features should be stored in the additional cell
state, additional gating functionality is also introduced. The effectiveness of
the proposed mode variational LSTM is verified using the facial expression
recognition task. Comparative experiments on publicly available datasets
verified that the proposed mode variational LSTM outperforms existing methods.
Moreover, a new dynamic facial expression dataset with different modes of
variation, including various modes like pose and illumination variations, was
collected to comprehensively evaluate the proposed mode variational LSTM.
Experimental results verified that the proposed mode variational LSTM encodes
spatio-temporal features robust to unseen modes of variation.</description><identifier>DOI: 10.48550/arxiv.1811.06937</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Learning</subject><creationdate>2018-11</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,777,882</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1811.06937$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1811.06937$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Baddar, Wissam J</creatorcontrib><creatorcontrib>Ro, Yong Man</creatorcontrib><title>Mode Variational LSTM Robust to Unseen Modes of Variation: Application to Facial Expression Recognition</title><description>Spatio-temporal feature encoding is essential for encoding the dynamics in
video sequences. Recurrent neural networks, particularly long short-term memory
(LSTM) units, have been popular as an efficient tool for encoding
spatio-temporal features in sequences. In this work, we investigate the effect
of mode variations on the encoded spatio-temporal features using LSTMs. We show
that the LSTM retains information related to the mode variation in the
sequence, which is irrelevant to the task at hand (e.g. classification facial
expressions). Actually, the LSTM forget mechanism is not robust enough to mode
variations and preserves information that could negatively affect the encoded
spatio-temporal features. We propose the mode variational LSTM to encode
spatio-temporal features robust to unseen modes of variation. The mode
variational LSTM modifies the original LSTM structure by adding an additional
cell state that focuses on encoding the mode variation in the input sequence.
To efficiently regulate what features should be stored in the additional cell
state, additional gating functionality is also introduced. The effectiveness of
the proposed mode variational LSTM is verified using the facial expression
recognition task. Comparative experiments on publicly available datasets
verified that the proposed mode variational LSTM outperforms existing methods.
Moreover, a new dynamic facial expression dataset with different modes of
variation, including various modes like pose and illumination variations, was
collected to comprehensively evaluate the proposed mode variational LSTM.
Experimental results verified that the proposed mode variational LSTM encodes
spatio-temporal features robust to unseen modes of variation.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpFj9FKwzAUhnPjhUwfwCvzAq1JTpKmuxtjU6FDmHW35axLRqA2paky315TBa8O_-H7f_gIueMsl0Yp9oDjxX_m3HCeM11CcU3Ou3Cy9ICjx8mHHjtavdY7ug_HjzjRKdC3Plrb04RFGtw_uqSrYeh8O4dEbrH1P_3NZRhtjOm5t2049z4BN-TKYRft7d9dkHq7qddPWfXy-LxeVRnqoshQmoIfrRRaa4UOuGIABqFlslRMgTFCAiCC0NyAbBk4A0q6Qpx0KZyGBbn_nZ1Nm2H07zh-Ncm4mY3hG4o-T8M</recordid><startdate>20181116</startdate><enddate>20181116</enddate><creator>Baddar, Wissam J</creator><creator>Ro, Yong Man</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20181116</creationdate><title>Mode Variational LSTM Robust to Unseen Modes of Variation: Application to Facial Expression Recognition</title><author>Baddar, Wissam J ; Ro, Yong Man</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a677-a4871be426665af3150338a3c0495053882433aa3261834c03f8354f72d692f63</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Baddar, Wissam J</creatorcontrib><creatorcontrib>Ro, Yong Man</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Baddar, Wissam J</au><au>Ro, Yong Man</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Mode Variational LSTM Robust to Unseen Modes of Variation: Application to Facial Expression Recognition</atitle><date>2018-11-16</date><risdate>2018</risdate><abstract>Spatio-temporal feature encoding is essential for encoding the dynamics in
video sequences. Recurrent neural networks, particularly long short-term memory
(LSTM) units, have been popular as an efficient tool for encoding
spatio-temporal features in sequences. In this work, we investigate the effect
of mode variations on the encoded spatio-temporal features using LSTMs. We show
that the LSTM retains information related to the mode variation in the
sequence, which is irrelevant to the task at hand (e.g. classification facial
expressions). Actually, the LSTM forget mechanism is not robust enough to mode
variations and preserves information that could negatively affect the encoded
spatio-temporal features. We propose the mode variational LSTM to encode
spatio-temporal features robust to unseen modes of variation. The mode
variational LSTM modifies the original LSTM structure by adding an additional
cell state that focuses on encoding the mode variation in the input sequence.
To efficiently regulate what features should be stored in the additional cell
state, additional gating functionality is also introduced. The effectiveness of
the proposed mode variational LSTM is verified using the facial expression
recognition task. Comparative experiments on publicly available datasets
verified that the proposed mode variational LSTM outperforms existing methods.
Moreover, a new dynamic facial expression dataset with different modes of
variation, including various modes like pose and illumination variations, was
collected to comprehensively evaluate the proposed mode variational LSTM.
Experimental results verified that the proposed mode variational LSTM encodes
spatio-temporal features robust to unseen modes of variation.</abstract><doi>10.48550/arxiv.1811.06937</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.1811.06937 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_1811_06937 |
source | arXiv.org |
subjects | Computer Science - Computer Vision and Pattern Recognition Computer Science - Learning |
title | Mode Variational LSTM Robust to Unseen Modes of Variation: Application to Facial Expression Recognition |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T20%3A02%3A28IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Mode%20Variational%20LSTM%20Robust%20to%20Unseen%20Modes%20of%20Variation:%20Application%20to%20Facial%20Expression%20Recognition&rft.au=Baddar,%20Wissam%20J&rft.date=2018-11-16&rft_id=info:doi/10.48550/arxiv.1811.06937&rft_dat=%3Carxiv_GOX%3E1811_06937%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |