Multi-Modal Continuous Valence And Arousal Prediction in the Wild Using Deep 3D Features and Sequence Modeling

Continuous affect prediction in the wild is a very interesting problem and is challenging as continuous prediction involves heavy computation. This paper presents the methodologies and techniques used in our contribution to predict continuous emotion dimensions i.e., valence and arousal in ABAW comp...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Rasipuram, Sowmya, Bhat, Junaid Hamid, Maitra, Anutosh
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Rasipuram, Sowmya
Bhat, Junaid Hamid
Maitra, Anutosh
description Continuous affect prediction in the wild is a very interesting problem and is challenging as continuous prediction involves heavy computation. This paper presents the methodologies and techniques used in our contribution to predict continuous emotion dimensions i.e., valence and arousal in ABAW competition on Aff-Wild2 database. Aff-Wild2 database consists of videos in the wild labelled for valence and arousal at frame level. Our proposed methodology uses fusion of both audio and video features (multi-modal) extracted using state-of-the-art methods. These audio-video features are used to train a sequence-to-sequence model that is based on Gated Recurrent Units (GRU). We show promising results on validation data with simple architecture. The overall valence and arousal of the proposed approach is 0.22 and 0.34, which is better than the competition baseline of 0.14 and 0.24 respectively.
doi_str_mv 10.48550/arxiv.2002.12766
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2002_12766</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2002_12766</sourcerecordid><originalsourceid>FETCH-LOGICAL-a676-87e5e645b673bb754a94d30da8cfb82ada09c01c9e633bd3b0aab393eef8aa9e3</originalsourceid><addsrcrecordid>eNotj8FOhDAURbtxYUY_wJXvB8BCocCSMI6azEQTZ8YleaUPbVLLWIrRvxfR1Uvuu_ckh7GrhMdZmef8Bv2X-YxTztM4SQspz5nbTTaYaDdotNAMLhg3DdMIR7TkOoLaaaj9nMzvJ0_adMEMDoyD8EbwYqyGw2jcK6yJTiDWsCEMk6cRcF4-08e0YGY-2bl2wc56tCNd_t8V229u9819tH28e2jqbYSykFFZUE4yy5UshFJFnmGVacE1ll2vyhQ18qrjSVeRFEJpoTiiEpUg6kvEisSKXf9hF-H25M07-u_2V7xdxMUPd91Uew</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Multi-Modal Continuous Valence And Arousal Prediction in the Wild Using Deep 3D Features and Sequence Modeling</title><source>arXiv.org</source><creator>Rasipuram, Sowmya ; Bhat, Junaid Hamid ; Maitra, Anutosh</creator><creatorcontrib>Rasipuram, Sowmya ; Bhat, Junaid Hamid ; Maitra, Anutosh</creatorcontrib><description>Continuous affect prediction in the wild is a very interesting problem and is challenging as continuous prediction involves heavy computation. This paper presents the methodologies and techniques used in our contribution to predict continuous emotion dimensions i.e., valence and arousal in ABAW competition on Aff-Wild2 database. Aff-Wild2 database consists of videos in the wild labelled for valence and arousal at frame level. Our proposed methodology uses fusion of both audio and video features (multi-modal) extracted using state-of-the-art methods. These audio-video features are used to train a sequence-to-sequence model that is based on Gated Recurrent Units (GRU). We show promising results on validation data with simple architecture. The overall valence and arousal of the proposed approach is 0.22 and 0.34, which is better than the competition baseline of 0.14 and 0.24 respectively.</description><identifier>DOI: 10.48550/arxiv.2002.12766</identifier><language>eng</language><subject>Computer Science - Sound</subject><creationdate>2020-02</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2002.12766$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2002.12766$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Rasipuram, Sowmya</creatorcontrib><creatorcontrib>Bhat, Junaid Hamid</creatorcontrib><creatorcontrib>Maitra, Anutosh</creatorcontrib><title>Multi-Modal Continuous Valence And Arousal Prediction in the Wild Using Deep 3D Features and Sequence Modeling</title><description>Continuous affect prediction in the wild is a very interesting problem and is challenging as continuous prediction involves heavy computation. This paper presents the methodologies and techniques used in our contribution to predict continuous emotion dimensions i.e., valence and arousal in ABAW competition on Aff-Wild2 database. Aff-Wild2 database consists of videos in the wild labelled for valence and arousal at frame level. Our proposed methodology uses fusion of both audio and video features (multi-modal) extracted using state-of-the-art methods. These audio-video features are used to train a sequence-to-sequence model that is based on Gated Recurrent Units (GRU). We show promising results on validation data with simple architecture. The overall valence and arousal of the proposed approach is 0.22 and 0.34, which is better than the competition baseline of 0.14 and 0.24 respectively.</description><subject>Computer Science - Sound</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8FOhDAURbtxYUY_wJXvB8BCocCSMI6azEQTZ8YleaUPbVLLWIrRvxfR1Uvuu_ckh7GrhMdZmef8Bv2X-YxTztM4SQspz5nbTTaYaDdotNAMLhg3DdMIR7TkOoLaaaj9nMzvJ0_adMEMDoyD8EbwYqyGw2jcK6yJTiDWsCEMk6cRcF4-08e0YGY-2bl2wc56tCNd_t8V229u9819tH28e2jqbYSykFFZUE4yy5UshFJFnmGVacE1ll2vyhQ18qrjSVeRFEJpoTiiEpUg6kvEisSKXf9hF-H25M07-u_2V7xdxMUPd91Uew</recordid><startdate>20200226</startdate><enddate>20200226</enddate><creator>Rasipuram, Sowmya</creator><creator>Bhat, Junaid Hamid</creator><creator>Maitra, Anutosh</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20200226</creationdate><title>Multi-Modal Continuous Valence And Arousal Prediction in the Wild Using Deep 3D Features and Sequence Modeling</title><author>Rasipuram, Sowmya ; Bhat, Junaid Hamid ; Maitra, Anutosh</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a676-87e5e645b673bb754a94d30da8cfb82ada09c01c9e633bd3b0aab393eef8aa9e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computer Science - Sound</topic><toplevel>online_resources</toplevel><creatorcontrib>Rasipuram, Sowmya</creatorcontrib><creatorcontrib>Bhat, Junaid Hamid</creatorcontrib><creatorcontrib>Maitra, Anutosh</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Rasipuram, Sowmya</au><au>Bhat, Junaid Hamid</au><au>Maitra, Anutosh</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multi-Modal Continuous Valence And Arousal Prediction in the Wild Using Deep 3D Features and Sequence Modeling</atitle><date>2020-02-26</date><risdate>2020</risdate><abstract>Continuous affect prediction in the wild is a very interesting problem and is challenging as continuous prediction involves heavy computation. This paper presents the methodologies and techniques used in our contribution to predict continuous emotion dimensions i.e., valence and arousal in ABAW competition on Aff-Wild2 database. Aff-Wild2 database consists of videos in the wild labelled for valence and arousal at frame level. Our proposed methodology uses fusion of both audio and video features (multi-modal) extracted using state-of-the-art methods. These audio-video features are used to train a sequence-to-sequence model that is based on Gated Recurrent Units (GRU). We show promising results on validation data with simple architecture. The overall valence and arousal of the proposed approach is 0.22 and 0.34, which is better than the competition baseline of 0.14 and 0.24 respectively.</abstract><doi>10.48550/arxiv.2002.12766</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2002.12766
ispartof
issn
language eng
recordid cdi_arxiv_primary_2002_12766
source arXiv.org
subjects Computer Science - Sound
title Multi-Modal Continuous Valence And Arousal Prediction in the Wild Using Deep 3D Features and Sequence Modeling
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-12T14%3A05%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multi-Modal%20Continuous%20Valence%20And%20Arousal%20Prediction%20in%20the%20Wild%20Using%20Deep%203D%20Features%20and%20Sequence%20Modeling&rft.au=Rasipuram,%20Sowmya&rft.date=2020-02-26&rft_id=info:doi/10.48550/arxiv.2002.12766&rft_dat=%3Carxiv_GOX%3E2002_12766%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true