TAPE: Temporal Attention-based Probabilistic human pose and shape Estimation

Reconstructing 3D human pose and shape from monocular videos is a well-studied but challenging problem. Common challenges include occlusions, the inherent ambiguities in the 2D to 3D mapping and the computational complexity of video processing. Existing methods ignore the ambiguities of the reconstr...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2023-04
Hauptverfasser:	Vasilikopoulos, Nikolaos, Kolotouros, Nikos, Tsoli, Aggeliki, Argyros, Antonis
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition Image processing Image reconstruction Neural networks Optimization Statistical analysis Video
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Vasilikopoulos, Nikolaos Kolotouros, Nikos Tsoli, Aggeliki Argyros, Antonis
description	Reconstructing 3D human pose and shape from monocular videos is a well-studied but challenging problem. Common challenges include occlusions, the inherent ambiguities in the 2D to 3D mapping and the computational complexity of video processing. Existing methods ignore the ambiguities of the reconstruction and provide a single deterministic estimate for the 3D pose. In order to address these issues, we present a Temporal Attention based Probabilistic human pose and shape Estimation method (TAPE) that operates on an RGB video. More specifically, we propose to use a neural network to encode video frames to temporal features using an attention-based neural network. Given these features, we output a per-frame but temporally-informed probability distribution for the human pose using Normalizing Flows. We show that TAPE outperforms state-of-the-art methods in standard benchmarks and serves as an effective video-based prior for optimization-based human pose and shape estimation. Code is available at: https: //github.com/nikosvasilik/TAPE
doi_str_mv	10.48550/arxiv.2305.00181
format	Article
fullrecord	<record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_2305_00181</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2808431869</sourcerecordid><originalsourceid>FETCH-LOGICAL-a529-23b6d06cb8e3a2994d8fc6cbe4530982efa405cbb19ff37fc7161134072f046e3</originalsourceid><addsrcrecordid>eNotj0trwzAQhEWh0JDmB_RUQc92V0_LvZngPsDQHHw3ki0RB78q2aX993WSnpZhZ3bnQ-iBQMyVEPCs_U_7HVMGIgYgitygDWWMRIpTeod2IZwAgMqECsE2qCizQ_6CS9tPo9cdzubZDnM7DpHRwTb44EejTdu1YW5rfFx6PeBpDBbrocHhqCeL83XV63PmHt063QW7-59bVL7m5f49Kj7fPvZZEWlB04gyIxuQtVGWaZqmvFGuXqXlgkGqqHWag6iNIalzLHF1QiQhjENCHXBp2RY9Xs9eUKvJr-_9b3VGri7Iq-Pp6pj8-LXYMFencfHD2qmiChRnRMmU_QFS9lip</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2808431869</pqid></control><display><type>article</type><title>TAPE: Temporal Attention-based Probabilistic human pose and shape Estimation</title><source>Freely Accessible Journals</source><source>arXiv.org</source><creator>Vasilikopoulos, Nikolaos ; Kolotouros, Nikos ; Tsoli, Aggeliki ; Argyros, Antonis</creator><creatorcontrib>Vasilikopoulos, Nikolaos ; Kolotouros, Nikos ; Tsoli, Aggeliki ; Argyros, Antonis</creatorcontrib><description>Reconstructing 3D human pose and shape from monocular videos is a well-studied but challenging problem. Common challenges include occlusions, the inherent ambiguities in the 2D to 3D mapping and the computational complexity of video processing. Existing methods ignore the ambiguities of the reconstruction and provide a single deterministic estimate for the 3D pose. In order to address these issues, we present a Temporal Attention based Probabilistic human pose and shape Estimation method (TAPE) that operates on an RGB video. More specifically, we propose to use a neural network to encode video frames to temporal features using an attention-based neural network. Given these features, we output a per-frame but temporally-informed probability distribution for the human pose using Normalizing Flows. We show that TAPE outperforms state-of-the-art methods in standard benchmarks and serves as an effective video-based prior for optimization-based human pose and shape estimation. Code is available at: https: //github.com/nikosvasilik/TAPE</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2305.00181</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Computer Science - Computer Vision and Pattern Recognition ; Image processing ; Image reconstruction ; Neural networks ; Optimization ; Statistical analysis ; Video</subject><ispartof>arXiv.org, 2023-04</ispartof><rights>2023. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,781,785,886,27930</link.rule.ids><backlink>$$Uhttps://doi.org/10.48550/arXiv.2305.00181$$DView paper in arXiv$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.1007/978-3-031-31438-4_28$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink></links><search><creatorcontrib>Vasilikopoulos, Nikolaos</creatorcontrib><creatorcontrib>Kolotouros, Nikos</creatorcontrib><creatorcontrib>Tsoli, Aggeliki</creatorcontrib><creatorcontrib>Argyros, Antonis</creatorcontrib><title>TAPE: Temporal Attention-based Probabilistic human pose and shape Estimation</title><title>arXiv.org</title><description>Reconstructing 3D human pose and shape from monocular videos is a well-studied but challenging problem. Common challenges include occlusions, the inherent ambiguities in the 2D to 3D mapping and the computational complexity of video processing. Existing methods ignore the ambiguities of the reconstruction and provide a single deterministic estimate for the 3D pose. In order to address these issues, we present a Temporal Attention based Probabilistic human pose and shape Estimation method (TAPE) that operates on an RGB video. More specifically, we propose to use a neural network to encode video frames to temporal features using an attention-based neural network. Given these features, we output a per-frame but temporally-informed probability distribution for the human pose using Normalizing Flows. We show that TAPE outperforms state-of-the-art methods in standard benchmarks and serves as an effective video-based prior for optimization-based human pose and shape estimation. Code is available at: https: //github.com/nikosvasilik/TAPE</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Image processing</subject><subject>Image reconstruction</subject><subject>Neural networks</subject><subject>Optimization</subject><subject>Statistical analysis</subject><subject>Video</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GOX</sourceid><recordid>eNotj0trwzAQhEWh0JDmB_RUQc92V0_LvZngPsDQHHw3ki0RB78q2aX993WSnpZhZ3bnQ-iBQMyVEPCs_U_7HVMGIgYgitygDWWMRIpTeod2IZwAgMqECsE2qCizQ_6CS9tPo9cdzubZDnM7DpHRwTb44EejTdu1YW5rfFx6PeBpDBbrocHhqCeL83XV63PmHt063QW7-59bVL7m5f49Kj7fPvZZEWlB04gyIxuQtVGWaZqmvFGuXqXlgkGqqHWag6iNIalzLHF1QiQhjENCHXBp2RY9Xs9eUKvJr-_9b3VGri7Iq-Pp6pj8-LXYMFencfHD2qmiChRnRMmU_QFS9lip</recordid><startdate>20230429</startdate><enddate>20230429</enddate><creator>Vasilikopoulos, Nikolaos</creator><creator>Kolotouros, Nikos</creator><creator>Tsoli, Aggeliki</creator><creator>Argyros, Antonis</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230429</creationdate><title>TAPE: Temporal Attention-based Probabilistic human pose and shape Estimation</title><author>Vasilikopoulos, Nikolaos ; Kolotouros, Nikos ; Tsoli, Aggeliki ; Argyros, Antonis</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a529-23b6d06cb8e3a2994d8fc6cbe4530982efa405cbb19ff37fc7161134072f046e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Image processing</topic><topic>Image reconstruction</topic><topic>Neural networks</topic><topic>Optimization</topic><topic>Statistical analysis</topic><topic>Video</topic><toplevel>online_resources</toplevel><creatorcontrib>Vasilikopoulos, Nikolaos</creatorcontrib><creatorcontrib>Kolotouros, Nikos</creatorcontrib><creatorcontrib>Tsoli, Aggeliki</creatorcontrib><creatorcontrib>Argyros, Antonis</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Vasilikopoulos, Nikolaos</au><au>Kolotouros, Nikos</au><au>Tsoli, Aggeliki</au><au>Argyros, Antonis</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>TAPE: Temporal Attention-based Probabilistic human pose and shape Estimation</atitle><jtitle>arXiv.org</jtitle><date>2023-04-29</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Reconstructing 3D human pose and shape from monocular videos is a well-studied but challenging problem. Common challenges include occlusions, the inherent ambiguities in the 2D to 3D mapping and the computational complexity of video processing. Existing methods ignore the ambiguities of the reconstruction and provide a single deterministic estimate for the 3D pose. In order to address these issues, we present a Temporal Attention based Probabilistic human pose and shape Estimation method (TAPE) that operates on an RGB video. More specifically, we propose to use a neural network to encode video frames to temporal features using an attention-based neural network. Given these features, we output a per-frame but temporally-informed probability distribution for the human pose using Normalizing Flows. We show that TAPE outperforms state-of-the-art methods in standard benchmarks and serves as an effective video-based prior for optimization-based human pose and shape estimation. Code is available at: https: //github.com/nikosvasilik/TAPE</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2305.00181</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2023-04
issn	2331-8422
language	eng
recordid	cdi_arxiv_primary_2305_00181
source	Freely Accessible Journals; arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition Image processing Image reconstruction Neural networks Optimization Statistical analysis Video
title	TAPE: Temporal Attention-based Probabilistic human pose and shape Estimation
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-13T10%3A56%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=TAPE:%20Temporal%20Attention-based%20Probabilistic%20human%20pose%20and%20shape%20Estimation&rft.jtitle=arXiv.org&rft.au=Vasilikopoulos,%20Nikolaos&rft.date=2023-04-29&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2305.00181&rft_dat=%3Cproquest_arxiv%3E2808431869%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2808431869&rft_id=info:pmid/&rfr_iscdi=true