Image-Based Synthesis for Deep 3D Human Pose Estimation
This paper addresses the problem of 3D human pose estimation in the wild. A significant challenge is the lack of training data, i.e., 2D images of humans annotated with 3D poses. Such data is necessary to train state-of-the-art CNN architectures. Here, we propose a solution to generate a large set o...
Gespeichert in:
Veröffentlicht in: | International journal of computer vision 2018-09, Vol.126 (9), p.993-1008 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1008 |
---|---|
container_issue | 9 |
container_start_page | 993 |
container_title | International journal of computer vision |
container_volume | 126 |
creator | Rogez, Grégory Schmid, Cordelia |
description | This paper addresses the problem of 3D human pose estimation in the wild. A significant challenge is the lack of training data, i.e., 2D images of humans annotated with 3D poses. Such data is necessary to train state-of-the-art CNN architectures. Here, we propose a solution to generate a large set of photorealistic synthetic images of humans with 3D pose annotations. We introduce an image-based synthesis engine that artificially augments a dataset of real images with 2D human pose annotations using 3D motion capture data. Given a candidate 3D pose, our algorithm selects for each joint an image whose 2D pose locally matches the projected 3D pose. The selected images are then combined to generate a new synthetic image by stitching local image patches in a kinematically constrained manner. The resulting images are used to train an end-to-end CNN for full-body 3D pose estimation. We cluster the training data into a large number of pose classes and tackle pose estimation as a
K
-way classification problem. Such an approach is viable only with large training sets such as ours. Our method outperforms most of the published works in terms of 3D pose estimation in controlled environments (Human3.6M) and shows promising results for real-world images (LSP). This demonstrates that CNNs trained on artificial images generalize well to real images. Compared to data generated from more classical rendering engines, our synthetic images do not require any domain adaptation or fine-tuning stage. |
doi_str_mv | 10.1007/s11263-018-1071-9 |
format | Article |
fullrecord | <record><control><sourceid>gale_hal_p</sourceid><recordid>TN_cdi_hal_primary_oai_HAL_hal_01717188v1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A550220576</galeid><sourcerecordid>A550220576</sourcerecordid><originalsourceid>FETCH-LOGICAL-c466t-f71ef192b02a6966229b429c3ada89128bed981eb8988a82f4eece9c08709be43</originalsourceid><addsrcrecordid>eNp1kU1LAzEQhoMoWD9-gLcFTx6iM9ndbHKsWm2hoFg9h-x2tq60m5psRf-9KVsUDzKHgeF5kwdexs4QLhGguAqIQqYcUHGEArneYwPMi5RjBvk-G4AWwHOp8ZAdhfAGAEKJdMCKycouiF_bQPNk9tV2rxSakNTOJ7dE6yS9TcablW2TRxcoGYWuWdmuce0JO6jtMtDpbh-zl7vR882YTx_uJzfDKa8yKTteF0g1alGCsFJLKYQuM6Gr1M6t0ihUSXOtkEqllbJK1BlRRboCVYAuKUuP2UX_7qtdmrWPv_sv42xjxsOp2d4AizhKfWBkz3t27d37hkJn3tzGt1HPCMA8yxGkitRlTy3skkzT1q7ztoozp1VTuZbqJt6HeQ5CQF7IX4VdIDIdfXYLuwnBTGZPf1ns2cq7EDzVP84IZluU6YuK2spsizI6ZkSfCZFtF-R_tf8PfQMa_pDg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2015451068</pqid></control><display><type>article</type><title>Image-Based Synthesis for Deep 3D Human Pose Estimation</title><source>SpringerLink_现刊</source><creator>Rogez, Grégory ; Schmid, Cordelia</creator><creatorcontrib>Rogez, Grégory ; Schmid, Cordelia</creatorcontrib><description>This paper addresses the problem of 3D human pose estimation in the wild. A significant challenge is the lack of training data, i.e., 2D images of humans annotated with 3D poses. Such data is necessary to train state-of-the-art CNN architectures. Here, we propose a solution to generate a large set of photorealistic synthetic images of humans with 3D pose annotations. We introduce an image-based synthesis engine that artificially augments a dataset of real images with 2D human pose annotations using 3D motion capture data. Given a candidate 3D pose, our algorithm selects for each joint an image whose 2D pose locally matches the projected 3D pose. The selected images are then combined to generate a new synthetic image by stitching local image patches in a kinematically constrained manner. The resulting images are used to train an end-to-end CNN for full-body 3D pose estimation. We cluster the training data into a large number of pose classes and tackle pose estimation as a
K
-way classification problem. Such an approach is viable only with large training sets such as ours. Our method outperforms most of the published works in terms of 3D pose estimation in controlled environments (Human3.6M) and shows promising results for real-world images (LSP). This demonstrates that CNNs trained on artificial images generalize well to real images. Compared to data generated from more classical rendering engines, our synthetic images do not require any domain adaptation or fine-tuning stage.</description><identifier>ISSN: 0920-5691</identifier><identifier>EISSN: 1573-1405</identifier><identifier>DOI: 10.1007/s11263-018-1071-9</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Annotations ; Artificial Intelligence ; Computer Imaging ; Computer Science ; Computer Vision and Pattern Recognition ; Human motion ; Image Processing and Computer Vision ; Motion capture ; Pattern Recognition ; Pattern Recognition and Graphics ; Pose estimation ; Stitching ; Synthesis ; Three dimensional bodies ; Three dimensional motion ; Training ; Vision</subject><ispartof>International journal of computer vision, 2018-09, Vol.126 (9), p.993-1008</ispartof><rights>Springer Science+Business Media, LLC, part of Springer Nature 2018</rights><rights>COPYRIGHT 2018 Springer</rights><rights>International Journal of Computer Vision is a copyright of Springer, (2018). All Rights Reserved.</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c466t-f71ef192b02a6966229b429c3ada89128bed981eb8988a82f4eece9c08709be43</citedby><cites>FETCH-LOGICAL-c466t-f71ef192b02a6966229b429c3ada89128bed981eb8988a82f4eece9c08709be43</cites><orcidid>0000-0002-2275-2129</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11263-018-1071-9$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11263-018-1071-9$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>230,314,780,784,885,27924,27925,41488,42557,51319</link.rule.ids><backlink>$$Uhttps://inria.hal.science/hal-01717188$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Rogez, Grégory</creatorcontrib><creatorcontrib>Schmid, Cordelia</creatorcontrib><title>Image-Based Synthesis for Deep 3D Human Pose Estimation</title><title>International journal of computer vision</title><addtitle>Int J Comput Vis</addtitle><description>This paper addresses the problem of 3D human pose estimation in the wild. A significant challenge is the lack of training data, i.e., 2D images of humans annotated with 3D poses. Such data is necessary to train state-of-the-art CNN architectures. Here, we propose a solution to generate a large set of photorealistic synthetic images of humans with 3D pose annotations. We introduce an image-based synthesis engine that artificially augments a dataset of real images with 2D human pose annotations using 3D motion capture data. Given a candidate 3D pose, our algorithm selects for each joint an image whose 2D pose locally matches the projected 3D pose. The selected images are then combined to generate a new synthetic image by stitching local image patches in a kinematically constrained manner. The resulting images are used to train an end-to-end CNN for full-body 3D pose estimation. We cluster the training data into a large number of pose classes and tackle pose estimation as a
K
-way classification problem. Such an approach is viable only with large training sets such as ours. Our method outperforms most of the published works in terms of 3D pose estimation in controlled environments (Human3.6M) and shows promising results for real-world images (LSP). This demonstrates that CNNs trained on artificial images generalize well to real images. Compared to data generated from more classical rendering engines, our synthetic images do not require any domain adaptation or fine-tuning stage.</description><subject>Annotations</subject><subject>Artificial Intelligence</subject><subject>Computer Imaging</subject><subject>Computer Science</subject><subject>Computer Vision and Pattern Recognition</subject><subject>Human motion</subject><subject>Image Processing and Computer Vision</subject><subject>Motion capture</subject><subject>Pattern Recognition</subject><subject>Pattern Recognition and Graphics</subject><subject>Pose estimation</subject><subject>Stitching</subject><subject>Synthesis</subject><subject>Three dimensional bodies</subject><subject>Three dimensional motion</subject><subject>Training</subject><subject>Vision</subject><issn>0920-5691</issn><issn>1573-1405</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp1kU1LAzEQhoMoWD9-gLcFTx6iM9ndbHKsWm2hoFg9h-x2tq60m5psRf-9KVsUDzKHgeF5kwdexs4QLhGguAqIQqYcUHGEArneYwPMi5RjBvk-G4AWwHOp8ZAdhfAGAEKJdMCKycouiF_bQPNk9tV2rxSakNTOJ7dE6yS9TcablW2TRxcoGYWuWdmuce0JO6jtMtDpbh-zl7vR882YTx_uJzfDKa8yKTteF0g1alGCsFJLKYQuM6Gr1M6t0ihUSXOtkEqllbJK1BlRRboCVYAuKUuP2UX_7qtdmrWPv_sv42xjxsOp2d4AizhKfWBkz3t27d37hkJn3tzGt1HPCMA8yxGkitRlTy3skkzT1q7ztoozp1VTuZbqJt6HeQ5CQF7IX4VdIDIdfXYLuwnBTGZPf1ns2cq7EDzVP84IZluU6YuK2spsizI6ZkSfCZFtF-R_tf8PfQMa_pDg</recordid><startdate>20180901</startdate><enddate>20180901</enddate><creator>Rogez, Grégory</creator><creator>Schmid, Cordelia</creator><general>Springer US</general><general>Springer</general><general>Springer Nature B.V</general><general>Springer Verlag</general><scope>AAYXX</scope><scope>CITATION</scope><scope>ISR</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PYYUZ</scope><scope>Q9U</scope><scope>1XC</scope><scope>VOOES</scope><orcidid>https://orcid.org/0000-0002-2275-2129</orcidid></search><sort><creationdate>20180901</creationdate><title>Image-Based Synthesis for Deep 3D Human Pose Estimation</title><author>Rogez, Grégory ; Schmid, Cordelia</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c466t-f71ef192b02a6966229b429c3ada89128bed981eb8988a82f4eece9c08709be43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Annotations</topic><topic>Artificial Intelligence</topic><topic>Computer Imaging</topic><topic>Computer Science</topic><topic>Computer Vision and Pattern Recognition</topic><topic>Human motion</topic><topic>Image Processing and Computer Vision</topic><topic>Motion capture</topic><topic>Pattern Recognition</topic><topic>Pattern Recognition and Graphics</topic><topic>Pose estimation</topic><topic>Stitching</topic><topic>Synthesis</topic><topic>Three dimensional bodies</topic><topic>Three dimensional motion</topic><topic>Training</topic><topic>Vision</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Rogez, Grégory</creatorcontrib><creatorcontrib>Schmid, Cordelia</creatorcontrib><collection>CrossRef</collection><collection>Science (Gale in Context)</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ProQuest_ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>ProQuest Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer science database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>ProQuest advanced technologies & aerospace journals</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ABI/INFORM Collection China</collection><collection>ProQuest Central Basic</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>Hyper Article en Ligne (HAL) (Open Access)</collection><jtitle>International journal of computer vision</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Rogez, Grégory</au><au>Schmid, Cordelia</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Image-Based Synthesis for Deep 3D Human Pose Estimation</atitle><jtitle>International journal of computer vision</jtitle><stitle>Int J Comput Vis</stitle><date>2018-09-01</date><risdate>2018</risdate><volume>126</volume><issue>9</issue><spage>993</spage><epage>1008</epage><pages>993-1008</pages><issn>0920-5691</issn><eissn>1573-1405</eissn><abstract>This paper addresses the problem of 3D human pose estimation in the wild. A significant challenge is the lack of training data, i.e., 2D images of humans annotated with 3D poses. Such data is necessary to train state-of-the-art CNN architectures. Here, we propose a solution to generate a large set of photorealistic synthetic images of humans with 3D pose annotations. We introduce an image-based synthesis engine that artificially augments a dataset of real images with 2D human pose annotations using 3D motion capture data. Given a candidate 3D pose, our algorithm selects for each joint an image whose 2D pose locally matches the projected 3D pose. The selected images are then combined to generate a new synthetic image by stitching local image patches in a kinematically constrained manner. The resulting images are used to train an end-to-end CNN for full-body 3D pose estimation. We cluster the training data into a large number of pose classes and tackle pose estimation as a
K
-way classification problem. Such an approach is viable only with large training sets such as ours. Our method outperforms most of the published works in terms of 3D pose estimation in controlled environments (Human3.6M) and shows promising results for real-world images (LSP). This demonstrates that CNNs trained on artificial images generalize well to real images. Compared to data generated from more classical rendering engines, our synthetic images do not require any domain adaptation or fine-tuning stage.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11263-018-1071-9</doi><tpages>16</tpages><orcidid>https://orcid.org/0000-0002-2275-2129</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0920-5691 |
ispartof | International journal of computer vision, 2018-09, Vol.126 (9), p.993-1008 |
issn | 0920-5691 1573-1405 |
language | eng |
recordid | cdi_hal_primary_oai_HAL_hal_01717188v1 |
source | SpringerLink_现刊 |
subjects | Annotations Artificial Intelligence Computer Imaging Computer Science Computer Vision and Pattern Recognition Human motion Image Processing and Computer Vision Motion capture Pattern Recognition Pattern Recognition and Graphics Pose estimation Stitching Synthesis Three dimensional bodies Three dimensional motion Training Vision |
title | Image-Based Synthesis for Deep 3D Human Pose Estimation |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T10%3A19%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_hal_p&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Image-Based%20Synthesis%20for%20Deep%203D%20Human%20Pose%20Estimation&rft.jtitle=International%20journal%20of%20computer%20vision&rft.au=Rogez,%20Gr%C3%A9gory&rft.date=2018-09-01&rft.volume=126&rft.issue=9&rft.spage=993&rft.epage=1008&rft.pages=993-1008&rft.issn=0920-5691&rft.eissn=1573-1405&rft_id=info:doi/10.1007/s11263-018-1071-9&rft_dat=%3Cgale_hal_p%3EA550220576%3C/gale_hal_p%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2015451068&rft_id=info:pmid/&rft_galeid=A550220576&rfr_iscdi=true |