TAPE: Temporal Attention-based Probabilistic human pose and shape Estimation

Reconstructing 3D human pose and shape from monocular videos is a well-studied but challenging problem. Common challenges include occlusions, the inherent ambiguities in the 2D to 3D mapping and the computational complexity of video processing. Existing methods ignore the ambiguities of the reconstr...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2023-04
Hauptverfasser: Vasilikopoulos, Nikolaos, Kolotouros, Nikos, Tsoli, Aggeliki, Argyros, Antonis
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Vasilikopoulos, Nikolaos
Kolotouros, Nikos
Tsoli, Aggeliki
Argyros, Antonis
description Reconstructing 3D human pose and shape from monocular videos is a well-studied but challenging problem. Common challenges include occlusions, the inherent ambiguities in the 2D to 3D mapping and the computational complexity of video processing. Existing methods ignore the ambiguities of the reconstruction and provide a single deterministic estimate for the 3D pose. In order to address these issues, we present a Temporal Attention based Probabilistic human pose and shape Estimation method (TAPE) that operates on an RGB video. More specifically, we propose to use a neural network to encode video frames to temporal features using an attention-based neural network. Given these features, we output a per-frame but temporally-informed probability distribution for the human pose using Normalizing Flows. We show that TAPE outperforms state-of-the-art methods in standard benchmarks and serves as an effective video-based prior for optimization-based human pose and shape estimation. Code is available at: https: //github.com/nikosvasilik/TAPE
doi_str_mv 10.48550/arxiv.2305.00181
format Article
fullrecord <record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_2305_00181</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2808431869</sourcerecordid><originalsourceid>FETCH-LOGICAL-a529-23b6d06cb8e3a2994d8fc6cbe4530982efa405cbb19ff37fc7161134072f046e3</originalsourceid><addsrcrecordid>eNotj0trwzAQhEWh0JDmB_RUQc92V0_LvZngPsDQHHw3ki0RB78q2aX993WSnpZhZ3bnQ-iBQMyVEPCs_U_7HVMGIgYgitygDWWMRIpTeod2IZwAgMqECsE2qCizQ_6CS9tPo9cdzubZDnM7DpHRwTb44EejTdu1YW5rfFx6PeBpDBbrocHhqCeL83XV63PmHt063QW7-59bVL7m5f49Kj7fPvZZEWlB04gyIxuQtVGWaZqmvFGuXqXlgkGqqHWag6iNIalzLHF1QiQhjENCHXBp2RY9Xs9eUKvJr-_9b3VGri7Iq-Pp6pj8-LXYMFencfHD2qmiChRnRMmU_QFS9lip</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2808431869</pqid></control><display><type>article</type><title>TAPE: Temporal Attention-based Probabilistic human pose and shape Estimation</title><source>Freely Accessible Journals</source><source>arXiv.org</source><creator>Vasilikopoulos, Nikolaos ; Kolotouros, Nikos ; Tsoli, Aggeliki ; Argyros, Antonis</creator><creatorcontrib>Vasilikopoulos, Nikolaos ; Kolotouros, Nikos ; Tsoli, Aggeliki ; Argyros, Antonis</creatorcontrib><description>Reconstructing 3D human pose and shape from monocular videos is a well-studied but challenging problem. Common challenges include occlusions, the inherent ambiguities in the 2D to 3D mapping and the computational complexity of video processing. Existing methods ignore the ambiguities of the reconstruction and provide a single deterministic estimate for the 3D pose. In order to address these issues, we present a Temporal Attention based Probabilistic human pose and shape Estimation method (TAPE) that operates on an RGB video. More specifically, we propose to use a neural network to encode video frames to temporal features using an attention-based neural network. Given these features, we output a per-frame but temporally-informed probability distribution for the human pose using Normalizing Flows. We show that TAPE outperforms state-of-the-art methods in standard benchmarks and serves as an effective video-based prior for optimization-based human pose and shape estimation. Code is available at: https: //github.com/nikosvasilik/TAPE</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2305.00181</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Computer Science - Computer Vision and Pattern Recognition ; Image processing ; Image reconstruction ; Neural networks ; Optimization ; Statistical analysis ; Video</subject><ispartof>arXiv.org, 2023-04</ispartof><rights>2023. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,781,785,886,27930</link.rule.ids><backlink>$$Uhttps://doi.org/10.48550/arXiv.2305.00181$$DView paper in arXiv$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.1007/978-3-031-31438-4_28$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink></links><search><creatorcontrib>Vasilikopoulos, Nikolaos</creatorcontrib><creatorcontrib>Kolotouros, Nikos</creatorcontrib><creatorcontrib>Tsoli, Aggeliki</creatorcontrib><creatorcontrib>Argyros, Antonis</creatorcontrib><title>TAPE: Temporal Attention-based Probabilistic human pose and shape Estimation</title><title>arXiv.org</title><description>Reconstructing 3D human pose and shape from monocular videos is a well-studied but challenging problem. Common challenges include occlusions, the inherent ambiguities in the 2D to 3D mapping and the computational complexity of video processing. Existing methods ignore the ambiguities of the reconstruction and provide a single deterministic estimate for the 3D pose. In order to address these issues, we present a Temporal Attention based Probabilistic human pose and shape Estimation method (TAPE) that operates on an RGB video. More specifically, we propose to use a neural network to encode video frames to temporal features using an attention-based neural network. Given these features, we output a per-frame but temporally-informed probability distribution for the human pose using Normalizing Flows. We show that TAPE outperforms state-of-the-art methods in standard benchmarks and serves as an effective video-based prior for optimization-based human pose and shape estimation. Code is available at: https: //github.com/nikosvasilik/TAPE</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Image processing</subject><subject>Image reconstruction</subject><subject>Neural networks</subject><subject>Optimization</subject><subject>Statistical analysis</subject><subject>Video</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GOX</sourceid><recordid>eNotj0trwzAQhEWh0JDmB_RUQc92V0_LvZngPsDQHHw3ki0RB78q2aX993WSnpZhZ3bnQ-iBQMyVEPCs_U_7HVMGIgYgitygDWWMRIpTeod2IZwAgMqECsE2qCizQ_6CS9tPo9cdzubZDnM7DpHRwTb44EejTdu1YW5rfFx6PeBpDBbrocHhqCeL83XV63PmHt063QW7-59bVL7m5f49Kj7fPvZZEWlB04gyIxuQtVGWaZqmvFGuXqXlgkGqqHWag6iNIalzLHF1QiQhjENCHXBp2RY9Xs9eUKvJr-_9b3VGri7Iq-Pp6pj8-LXYMFencfHD2qmiChRnRMmU_QFS9lip</recordid><startdate>20230429</startdate><enddate>20230429</enddate><creator>Vasilikopoulos, Nikolaos</creator><creator>Kolotouros, Nikos</creator><creator>Tsoli, Aggeliki</creator><creator>Argyros, Antonis</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230429</creationdate><title>TAPE: Temporal Attention-based Probabilistic human pose and shape Estimation</title><author>Vasilikopoulos, Nikolaos ; Kolotouros, Nikos ; Tsoli, Aggeliki ; Argyros, Antonis</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a529-23b6d06cb8e3a2994d8fc6cbe4530982efa405cbb19ff37fc7161134072f046e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Image processing</topic><topic>Image reconstruction</topic><topic>Neural networks</topic><topic>Optimization</topic><topic>Statistical analysis</topic><topic>Video</topic><toplevel>online_resources</toplevel><creatorcontrib>Vasilikopoulos, Nikolaos</creatorcontrib><creatorcontrib>Kolotouros, Nikos</creatorcontrib><creatorcontrib>Tsoli, Aggeliki</creatorcontrib><creatorcontrib>Argyros, Antonis</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Vasilikopoulos, Nikolaos</au><au>Kolotouros, Nikos</au><au>Tsoli, Aggeliki</au><au>Argyros, Antonis</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>TAPE: Temporal Attention-based Probabilistic human pose and shape Estimation</atitle><jtitle>arXiv.org</jtitle><date>2023-04-29</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Reconstructing 3D human pose and shape from monocular videos is a well-studied but challenging problem. Common challenges include occlusions, the inherent ambiguities in the 2D to 3D mapping and the computational complexity of video processing. Existing methods ignore the ambiguities of the reconstruction and provide a single deterministic estimate for the 3D pose. In order to address these issues, we present a Temporal Attention based Probabilistic human pose and shape Estimation method (TAPE) that operates on an RGB video. More specifically, we propose to use a neural network to encode video frames to temporal features using an attention-based neural network. Given these features, we output a per-frame but temporally-informed probability distribution for the human pose using Normalizing Flows. We show that TAPE outperforms state-of-the-art methods in standard benchmarks and serves as an effective video-based prior for optimization-based human pose and shape estimation. Code is available at: https: //github.com/nikosvasilik/TAPE</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2305.00181</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2023-04
issn 2331-8422
language eng
recordid cdi_arxiv_primary_2305_00181
source Freely Accessible Journals; arXiv.org
subjects Computer Science - Computer Vision and Pattern Recognition
Image processing
Image reconstruction
Neural networks
Optimization
Statistical analysis
Video
title TAPE: Temporal Attention-based Probabilistic human pose and shape Estimation
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-13T10%3A56%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=TAPE:%20Temporal%20Attention-based%20Probabilistic%20human%20pose%20and%20shape%20Estimation&rft.jtitle=arXiv.org&rft.au=Vasilikopoulos,%20Nikolaos&rft.date=2023-04-29&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2305.00181&rft_dat=%3Cproquest_arxiv%3E2808431869%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2808431869&rft_id=info:pmid/&rfr_iscdi=true