Multi-Modal Continuous Valence And Arousal Prediction in the Wild Using Deep 3D Features and Sequence Modeling

Continuous affect prediction in the wild is a very interesting problem and is challenging as continuous prediction involves heavy computation. This paper presents the methodologies and techniques used in our contribution to predict continuous emotion dimensions i.e., valence and arousal in ABAW comp...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Rasipuram, Sowmya, Bhat, Junaid Hamid, Maitra, Anutosh
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Sound
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Rasipuram, Sowmya Bhat, Junaid Hamid Maitra, Anutosh
description	Continuous affect prediction in the wild is a very interesting problem and is challenging as continuous prediction involves heavy computation. This paper presents the methodologies and techniques used in our contribution to predict continuous emotion dimensions i.e., valence and arousal in ABAW competition on Aff-Wild2 database. Aff-Wild2 database consists of videos in the wild labelled for valence and arousal at frame level. Our proposed methodology uses fusion of both audio and video features (multi-modal) extracted using state-of-the-art methods. These audio-video features are used to train a sequence-to-sequence model that is based on Gated Recurrent Units (GRU). We show promising results on validation data with simple architecture. The overall valence and arousal of the proposed approach is 0.22 and 0.34, which is better than the competition baseline of 0.14 and 0.24 respectively.
doi_str_mv	10.48550/arxiv.2002.12766
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2002_12766</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2002_12766</sourcerecordid><originalsourceid>FETCH-LOGICAL-a676-87e5e645b673bb754a94d30da8cfb82ada09c01c9e633bd3b0aab393eef8aa9e3</originalsourceid><addsrcrecordid>eNotj8FOhDAURbtxYUY_wJXvB8BCocCSMI6azEQTZ8YleaUPbVLLWIrRvxfR1Uvuu_ckh7GrhMdZmef8Bv2X-YxTztM4SQspz5nbTTaYaDdotNAMLhg3DdMIR7TkOoLaaaj9nMzvJ0_adMEMDoyD8EbwYqyGw2jcK6yJTiDWsCEMk6cRcF4-08e0YGY-2bl2wc56tCNd_t8V229u9819tH28e2jqbYSykFFZUE4yy5UshFJFnmGVacE1ll2vyhQ18qrjSVeRFEJpoTiiEpUg6kvEisSKXf9hF-H25M07-u_2V7xdxMUPd91Uew</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Multi-Modal Continuous Valence And Arousal Prediction in the Wild Using Deep 3D Features and Sequence Modeling</title><source>arXiv.org</source><creator>Rasipuram, Sowmya ; Bhat, Junaid Hamid ; Maitra, Anutosh</creator><creatorcontrib>Rasipuram, Sowmya ; Bhat, Junaid Hamid ; Maitra, Anutosh</creatorcontrib><description>Continuous affect prediction in the wild is a very interesting problem and is challenging as continuous prediction involves heavy computation. This paper presents the methodologies and techniques used in our contribution to predict continuous emotion dimensions i.e., valence and arousal in ABAW competition on Aff-Wild2 database. Aff-Wild2 database consists of videos in the wild labelled for valence and arousal at frame level. Our proposed methodology uses fusion of both audio and video features (multi-modal) extracted using state-of-the-art methods. These audio-video features are used to train a sequence-to-sequence model that is based on Gated Recurrent Units (GRU). We show promising results on validation data with simple architecture. The overall valence and arousal of the proposed approach is 0.22 and 0.34, which is better than the competition baseline of 0.14 and 0.24 respectively.</description><identifier>DOI: 10.48550/arxiv.2002.12766</identifier><language>eng</language><subject>Computer Science - Sound</subject><creationdate>2020-02</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2002.12766$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2002.12766$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Rasipuram, Sowmya</creatorcontrib><creatorcontrib>Bhat, Junaid Hamid</creatorcontrib><creatorcontrib>Maitra, Anutosh</creatorcontrib><title>Multi-Modal Continuous Valence And Arousal Prediction in the Wild Using Deep 3D Features and Sequence Modeling</title><description>Continuous affect prediction in the wild is a very interesting problem and is challenging as continuous prediction involves heavy computation. This paper presents the methodologies and techniques used in our contribution to predict continuous emotion dimensions i.e., valence and arousal in ABAW competition on Aff-Wild2 database. Aff-Wild2 database consists of videos in the wild labelled for valence and arousal at frame level. Our proposed methodology uses fusion of both audio and video features (multi-modal) extracted using state-of-the-art methods. These audio-video features are used to train a sequence-to-sequence model that is based on Gated Recurrent Units (GRU). We show promising results on validation data with simple architecture. The overall valence and arousal of the proposed approach is 0.22 and 0.34, which is better than the competition baseline of 0.14 and 0.24 respectively.</description><subject>Computer Science - Sound</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8FOhDAURbtxYUY_wJXvB8BCocCSMI6azEQTZ8YleaUPbVLLWIrRvxfR1Uvuu_ckh7GrhMdZmef8Bv2X-YxTztM4SQspz5nbTTaYaDdotNAMLhg3DdMIR7TkOoLaaaj9nMzvJ0_adMEMDoyD8EbwYqyGw2jcK6yJTiDWsCEMk6cRcF4-08e0YGY-2bl2wc56tCNd_t8V229u9819tH28e2jqbYSykFFZUE4yy5UshFJFnmGVacE1ll2vyhQ18qrjSVeRFEJpoTiiEpUg6kvEisSKXf9hF-H25M07-u_2V7xdxMUPd91Uew</recordid><startdate>20200226</startdate><enddate>20200226</enddate><creator>Rasipuram, Sowmya</creator><creator>Bhat, Junaid Hamid</creator><creator>Maitra, Anutosh</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20200226</creationdate><title>Multi-Modal Continuous Valence And Arousal Prediction in the Wild Using Deep 3D Features and Sequence Modeling</title><author>Rasipuram, Sowmya ; Bhat, Junaid Hamid ; Maitra, Anutosh</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a676-87e5e645b673bb754a94d30da8cfb82ada09c01c9e633bd3b0aab393eef8aa9e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computer Science - Sound</topic><toplevel>online_resources</toplevel><creatorcontrib>Rasipuram, Sowmya</creatorcontrib><creatorcontrib>Bhat, Junaid Hamid</creatorcontrib><creatorcontrib>Maitra, Anutosh</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Rasipuram, Sowmya</au><au>Bhat, Junaid Hamid</au><au>Maitra, Anutosh</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multi-Modal Continuous Valence And Arousal Prediction in the Wild Using Deep 3D Features and Sequence Modeling</atitle><date>2020-02-26</date><risdate>2020</risdate><abstract>Continuous affect prediction in the wild is a very interesting problem and is challenging as continuous prediction involves heavy computation. This paper presents the methodologies and techniques used in our contribution to predict continuous emotion dimensions i.e., valence and arousal in ABAW competition on Aff-Wild2 database. Aff-Wild2 database consists of videos in the wild labelled for valence and arousal at frame level. Our proposed methodology uses fusion of both audio and video features (multi-modal) extracted using state-of-the-art methods. These audio-video features are used to train a sequence-to-sequence model that is based on Gated Recurrent Units (GRU). We show promising results on validation data with simple architecture. The overall valence and arousal of the proposed approach is 0.22 and 0.34, which is better than the competition baseline of 0.14 and 0.24 respectively.</abstract><doi>10.48550/arxiv.2002.12766</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2002.12766
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2002_12766
source	arXiv.org
subjects	Computer Science - Sound
title	Multi-Modal Continuous Valence And Arousal Prediction in the Wild Using Deep 3D Features and Sequence Modeling
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-12T14%3A05%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multi-Modal%20Continuous%20Valence%20And%20Arousal%20Prediction%20in%20the%20Wild%20Using%20Deep%203D%20Features%20and%20Sequence%20Modeling&rft.au=Rasipuram,%20Sowmya&rft.date=2020-02-26&rft_id=info:doi/10.48550/arxiv.2002.12766&rft_dat=%3Carxiv_GOX%3E2002_12766%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true