TAPE: Temporal Attention-based Probabilistic human pose and shape Estimation
Reconstructing 3D human pose and shape from monocular videos is a well-studied but challenging problem. Common challenges include occlusions, the inherent ambiguities in the 2D to 3D mapping and the computational complexity of video processing. Existing methods ignore the ambiguities of the reconstr...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2023-04 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Vasilikopoulos, Nikolaos Kolotouros, Nikos Tsoli, Aggeliki Argyros, Antonis |
description | Reconstructing 3D human pose and shape from monocular videos is a well-studied but challenging problem. Common challenges include occlusions, the inherent ambiguities in the 2D to 3D mapping and the computational complexity of video processing. Existing methods ignore the ambiguities of the reconstruction and provide a single deterministic estimate for the 3D pose. In order to address these issues, we present a Temporal Attention based Probabilistic human pose and shape Estimation method (TAPE) that operates on an RGB video. More specifically, we propose to use a neural network to encode video frames to temporal features using an attention-based neural network. Given these features, we output a per-frame but temporally-informed probability distribution for the human pose using Normalizing Flows. We show that TAPE outperforms state-of-the-art methods in standard benchmarks and serves as an effective video-based prior for optimization-based human pose and shape estimation. Code is available at: https: //github.com/nikosvasilik/TAPE |
doi_str_mv | 10.48550/arxiv.2305.00181 |
format | Article |
fullrecord | <record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_2305_00181</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2808431869</sourcerecordid><originalsourceid>FETCH-LOGICAL-a529-23b6d06cb8e3a2994d8fc6cbe4530982efa405cbb19ff37fc7161134072f046e3</originalsourceid><addsrcrecordid>eNotj0trwzAQhEWh0JDmB_RUQc92V0_LvZngPsDQHHw3ki0RB78q2aX993WSnpZhZ3bnQ-iBQMyVEPCs_U_7HVMGIgYgitygDWWMRIpTeod2IZwAgMqECsE2qCizQ_6CS9tPo9cdzubZDnM7DpHRwTb44EejTdu1YW5rfFx6PeBpDBbrocHhqCeL83XV63PmHt063QW7-59bVL7m5f49Kj7fPvZZEWlB04gyIxuQtVGWaZqmvFGuXqXlgkGqqHWag6iNIalzLHF1QiQhjENCHXBp2RY9Xs9eUKvJr-_9b3VGri7Iq-Pp6pj8-LXYMFencfHD2qmiChRnRMmU_QFS9lip</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2808431869</pqid></control><display><type>article</type><title>TAPE: Temporal Attention-based Probabilistic human pose and shape Estimation</title><source>Freely Accessible Journals</source><source>arXiv.org</source><creator>Vasilikopoulos, Nikolaos ; Kolotouros, Nikos ; Tsoli, Aggeliki ; Argyros, Antonis</creator><creatorcontrib>Vasilikopoulos, Nikolaos ; Kolotouros, Nikos ; Tsoli, Aggeliki ; Argyros, Antonis</creatorcontrib><description>Reconstructing 3D human pose and shape from monocular videos is a well-studied but challenging problem. Common challenges include occlusions, the inherent ambiguities in the 2D to 3D mapping and the computational complexity of video processing. Existing methods ignore the ambiguities of the reconstruction and provide a single deterministic estimate for the 3D pose. In order to address these issues, we present a Temporal Attention based Probabilistic human pose and shape Estimation method (TAPE) that operates on an RGB video. More specifically, we propose to use a neural network to encode video frames to temporal features using an attention-based neural network. Given these features, we output a per-frame but temporally-informed probability distribution for the human pose using Normalizing Flows. We show that TAPE outperforms state-of-the-art methods in standard benchmarks and serves as an effective video-based prior for optimization-based human pose and shape estimation. Code is available at: https: //github.com/nikosvasilik/TAPE</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2305.00181</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Computer Science - Computer Vision and Pattern Recognition ; Image processing ; Image reconstruction ; Neural networks ; Optimization ; Statistical analysis ; Video</subject><ispartof>arXiv.org, 2023-04</ispartof><rights>2023. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,781,785,886,27930</link.rule.ids><backlink>$$Uhttps://doi.org/10.48550/arXiv.2305.00181$$DView paper in arXiv$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.1007/978-3-031-31438-4_28$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink></links><search><creatorcontrib>Vasilikopoulos, Nikolaos</creatorcontrib><creatorcontrib>Kolotouros, Nikos</creatorcontrib><creatorcontrib>Tsoli, Aggeliki</creatorcontrib><creatorcontrib>Argyros, Antonis</creatorcontrib><title>TAPE: Temporal Attention-based Probabilistic human pose and shape Estimation</title><title>arXiv.org</title><description>Reconstructing 3D human pose and shape from monocular videos is a well-studied but challenging problem. Common challenges include occlusions, the inherent ambiguities in the 2D to 3D mapping and the computational complexity of video processing. Existing methods ignore the ambiguities of the reconstruction and provide a single deterministic estimate for the 3D pose. In order to address these issues, we present a Temporal Attention based Probabilistic human pose and shape Estimation method (TAPE) that operates on an RGB video. More specifically, we propose to use a neural network to encode video frames to temporal features using an attention-based neural network. Given these features, we output a per-frame but temporally-informed probability distribution for the human pose using Normalizing Flows. We show that TAPE outperforms state-of-the-art methods in standard benchmarks and serves as an effective video-based prior for optimization-based human pose and shape estimation. Code is available at: https: //github.com/nikosvasilik/TAPE</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Image processing</subject><subject>Image reconstruction</subject><subject>Neural networks</subject><subject>Optimization</subject><subject>Statistical analysis</subject><subject>Video</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GOX</sourceid><recordid>eNotj0trwzAQhEWh0JDmB_RUQc92V0_LvZngPsDQHHw3ki0RB78q2aX993WSnpZhZ3bnQ-iBQMyVEPCs_U_7HVMGIgYgitygDWWMRIpTeod2IZwAgMqECsE2qCizQ_6CS9tPo9cdzubZDnM7DpHRwTb44EejTdu1YW5rfFx6PeBpDBbrocHhqCeL83XV63PmHt063QW7-59bVL7m5f49Kj7fPvZZEWlB04gyIxuQtVGWaZqmvFGuXqXlgkGqqHWag6iNIalzLHF1QiQhjENCHXBp2RY9Xs9eUKvJr-_9b3VGri7Iq-Pp6pj8-LXYMFencfHD2qmiChRnRMmU_QFS9lip</recordid><startdate>20230429</startdate><enddate>20230429</enddate><creator>Vasilikopoulos, Nikolaos</creator><creator>Kolotouros, Nikos</creator><creator>Tsoli, Aggeliki</creator><creator>Argyros, Antonis</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230429</creationdate><title>TAPE: Temporal Attention-based Probabilistic human pose and shape Estimation</title><author>Vasilikopoulos, Nikolaos ; Kolotouros, Nikos ; Tsoli, Aggeliki ; Argyros, Antonis</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a529-23b6d06cb8e3a2994d8fc6cbe4530982efa405cbb19ff37fc7161134072f046e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Image processing</topic><topic>Image reconstruction</topic><topic>Neural networks</topic><topic>Optimization</topic><topic>Statistical analysis</topic><topic>Video</topic><toplevel>online_resources</toplevel><creatorcontrib>Vasilikopoulos, Nikolaos</creatorcontrib><creatorcontrib>Kolotouros, Nikos</creatorcontrib><creatorcontrib>Tsoli, Aggeliki</creatorcontrib><creatorcontrib>Argyros, Antonis</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Vasilikopoulos, Nikolaos</au><au>Kolotouros, Nikos</au><au>Tsoli, Aggeliki</au><au>Argyros, Antonis</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>TAPE: Temporal Attention-based Probabilistic human pose and shape Estimation</atitle><jtitle>arXiv.org</jtitle><date>2023-04-29</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Reconstructing 3D human pose and shape from monocular videos is a well-studied but challenging problem. Common challenges include occlusions, the inherent ambiguities in the 2D to 3D mapping and the computational complexity of video processing. Existing methods ignore the ambiguities of the reconstruction and provide a single deterministic estimate for the 3D pose. In order to address these issues, we present a Temporal Attention based Probabilistic human pose and shape Estimation method (TAPE) that operates on an RGB video. More specifically, we propose to use a neural network to encode video frames to temporal features using an attention-based neural network. Given these features, we output a per-frame but temporally-informed probability distribution for the human pose using Normalizing Flows. We show that TAPE outperforms state-of-the-art methods in standard benchmarks and serves as an effective video-based prior for optimization-based human pose and shape estimation. Code is available at: https: //github.com/nikosvasilik/TAPE</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2305.00181</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2023-04 |
issn | 2331-8422 |
language | eng |
recordid | cdi_arxiv_primary_2305_00181 |
source | Freely Accessible Journals; arXiv.org |
subjects | Computer Science - Computer Vision and Pattern Recognition Image processing Image reconstruction Neural networks Optimization Statistical analysis Video |
title | TAPE: Temporal Attention-based Probabilistic human pose and shape Estimation |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-13T10%3A56%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=TAPE:%20Temporal%20Attention-based%20Probabilistic%20human%20pose%20and%20shape%20Estimation&rft.jtitle=arXiv.org&rft.au=Vasilikopoulos,%20Nikolaos&rft.date=2023-04-29&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2305.00181&rft_dat=%3Cproquest_arxiv%3E2808431869%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2808431869&rft_id=info:pmid/&rfr_iscdi=true |