Authentic volumetric avatars from a phone scan

Creating photorealistic avatars of existing people currently requires extensive person-specific data capture, which is usually only accessible to the VFX industry and not the general public. Our work aims to address this drawback by relying only on a short mobile phone capture to obtain a drivable 3...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:ACM transactions on graphics 2022-07, Vol.41 (4), p.1-19, Article 163
Hauptverfasser: Cao, Chen, Simon, Tomas, Kim, Jin Kyu, Schwartz, Gabe, Zollhoefer, Michael, Saito, Shun-Suke, Lombardi, Stephen, Wei, Shih-En, Belko, Danielle, Yu, Shoou-I, Sheikh, Yaser, Saragih, Jason
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 19
container_issue 4
container_start_page 1
container_title ACM transactions on graphics
container_volume 41
creator Cao, Chen
Simon, Tomas
Kim, Jin Kyu
Schwartz, Gabe
Zollhoefer, Michael
Saito, Shun-Suke
Lombardi, Stephen
Wei, Shih-En
Belko, Danielle
Yu, Shoou-I
Sheikh, Yaser
Saragih, Jason
description Creating photorealistic avatars of existing people currently requires extensive person-specific data capture, which is usually only accessible to the VFX industry and not the general public. Our work aims to address this drawback by relying only on a short mobile phone capture to obtain a drivable 3D head avatar that matches a person's likeness faithfully. In contrast to existing approaches, our architecture avoids the complex task of directly modeling the entire manifold of human appearance, aiming instead to generate an avatar model that can be specialized to novel identities using only small amounts of data. The model dispenses with low-dimensional latent spaces that are commonly employed for hallucinating novel identities, and instead, uses a conditional representation that can extract person-specific information at multiple scales from a high resolution registered neutral phone scan. We achieve high quality results through the use of a novel universal avatar prior that has been trained on high resolution multi-view video captures of facial performances of hundreds of human subjects. By fine-tuning the model using inverse rendering we achieve increased realism and personalize its range of motion. The output of our approach is not only a high-fidelity 3D head avatar that matches the person's facial shape and appearance, but one that can also be driven using a jointly discovered shared global expression space with disentangled controls for gaze direction. Via a series of experiments we demonstrate that our avatars are faithful representations of the subject's likeness. Compared to other state-of-the-art methods for lightweight avatar creation, our approach exhibits superior visual quality and animateability.
doi_str_mv 10.1145/3528223.3530143
format Article
fullrecord <record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3528223_3530143</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3530143</sourcerecordid><originalsourceid>FETCH-LOGICAL-a301t-a6fb386c142d41b2dae3b4bbc36c36edfc173a3ee71cba9396ace505b7b0f1fc3</originalsourceid><addsrcrecordid>eNo9j09Lw0AQxRdRMFbPgqd8gaQzmf3THktRKxS86DnMbnZppUnKblrw2xtpFAbewJv3hp8QjwglolRzUtWiqqgkRYCSrkSGSpnCkF5ciwwMQQGjcyvuUvoCAC2lzkS5Og073w17l5_7w6n1QxxXPvPAMeUh9m3O-XHXdz5Pjrt7cRP4kPzDpDPx-fL8sd4U2_fXt_VqW_D4YihYB0sL7VBWjURbNezJSmsd6XF8ExwaYvLeoLO8pKVm5xUoaywEDI5mYn7pdbFPKfpQH-O-5fhdI9S_uPWEW0-4Y-LpkmDX_h__mT_op1BH</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Authentic volumetric avatars from a phone scan</title><source>ACM Digital Library Complete</source><creator>Cao, Chen ; Simon, Tomas ; Kim, Jin Kyu ; Schwartz, Gabe ; Zollhoefer, Michael ; Saito, Shun-Suke ; Lombardi, Stephen ; Wei, Shih-En ; Belko, Danielle ; Yu, Shoou-I ; Sheikh, Yaser ; Saragih, Jason</creator><creatorcontrib>Cao, Chen ; Simon, Tomas ; Kim, Jin Kyu ; Schwartz, Gabe ; Zollhoefer, Michael ; Saito, Shun-Suke ; Lombardi, Stephen ; Wei, Shih-En ; Belko, Danielle ; Yu, Shoou-I ; Sheikh, Yaser ; Saragih, Jason</creatorcontrib><description>Creating photorealistic avatars of existing people currently requires extensive person-specific data capture, which is usually only accessible to the VFX industry and not the general public. Our work aims to address this drawback by relying only on a short mobile phone capture to obtain a drivable 3D head avatar that matches a person's likeness faithfully. In contrast to existing approaches, our architecture avoids the complex task of directly modeling the entire manifold of human appearance, aiming instead to generate an avatar model that can be specialized to novel identities using only small amounts of data. The model dispenses with low-dimensional latent spaces that are commonly employed for hallucinating novel identities, and instead, uses a conditional representation that can extract person-specific information at multiple scales from a high resolution registered neutral phone scan. We achieve high quality results through the use of a novel universal avatar prior that has been trained on high resolution multi-view video captures of facial performances of hundreds of human subjects. By fine-tuning the model using inverse rendering we achieve increased realism and personalize its range of motion. The output of our approach is not only a high-fidelity 3D head avatar that matches the person's facial shape and appearance, but one that can also be driven using a jointly discovered shared global expression space with disentangled controls for gaze direction. Via a series of experiments we demonstrate that our avatars are faithful representations of the subject's likeness. Compared to other state-of-the-art methods for lightweight avatar creation, our approach exhibits superior visual quality and animateability.</description><identifier>ISSN: 0730-0301</identifier><identifier>EISSN: 1557-7368</identifier><identifier>DOI: 10.1145/3528223.3530143</identifier><language>eng</language><publisher>New York, NY, USA: ACM</publisher><subject>Animation ; Computer graphics ; Computing methodologies</subject><ispartof>ACM transactions on graphics, 2022-07, Vol.41 (4), p.1-19, Article 163</ispartof><rights>Owner/Author</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a301t-a6fb386c142d41b2dae3b4bbc36c36edfc173a3ee71cba9396ace505b7b0f1fc3</citedby><cites>FETCH-LOGICAL-a301t-a6fb386c142d41b2dae3b4bbc36c36edfc173a3ee71cba9396ace505b7b0f1fc3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://dl.acm.org/doi/pdf/10.1145/3528223.3530143$$EPDF$$P50$$Gacm$$Hfree_for_read</linktopdf><link.rule.ids>314,776,780,2276,27901,27902,40172,75971</link.rule.ids></links><search><creatorcontrib>Cao, Chen</creatorcontrib><creatorcontrib>Simon, Tomas</creatorcontrib><creatorcontrib>Kim, Jin Kyu</creatorcontrib><creatorcontrib>Schwartz, Gabe</creatorcontrib><creatorcontrib>Zollhoefer, Michael</creatorcontrib><creatorcontrib>Saito, Shun-Suke</creatorcontrib><creatorcontrib>Lombardi, Stephen</creatorcontrib><creatorcontrib>Wei, Shih-En</creatorcontrib><creatorcontrib>Belko, Danielle</creatorcontrib><creatorcontrib>Yu, Shoou-I</creatorcontrib><creatorcontrib>Sheikh, Yaser</creatorcontrib><creatorcontrib>Saragih, Jason</creatorcontrib><title>Authentic volumetric avatars from a phone scan</title><title>ACM transactions on graphics</title><addtitle>ACM TOG</addtitle><description>Creating photorealistic avatars of existing people currently requires extensive person-specific data capture, which is usually only accessible to the VFX industry and not the general public. Our work aims to address this drawback by relying only on a short mobile phone capture to obtain a drivable 3D head avatar that matches a person's likeness faithfully. In contrast to existing approaches, our architecture avoids the complex task of directly modeling the entire manifold of human appearance, aiming instead to generate an avatar model that can be specialized to novel identities using only small amounts of data. The model dispenses with low-dimensional latent spaces that are commonly employed for hallucinating novel identities, and instead, uses a conditional representation that can extract person-specific information at multiple scales from a high resolution registered neutral phone scan. We achieve high quality results through the use of a novel universal avatar prior that has been trained on high resolution multi-view video captures of facial performances of hundreds of human subjects. By fine-tuning the model using inverse rendering we achieve increased realism and personalize its range of motion. The output of our approach is not only a high-fidelity 3D head avatar that matches the person's facial shape and appearance, but one that can also be driven using a jointly discovered shared global expression space with disentangled controls for gaze direction. Via a series of experiments we demonstrate that our avatars are faithful representations of the subject's likeness. Compared to other state-of-the-art methods for lightweight avatar creation, our approach exhibits superior visual quality and animateability.</description><subject>Animation</subject><subject>Computer graphics</subject><subject>Computing methodologies</subject><issn>0730-0301</issn><issn>1557-7368</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNo9j09Lw0AQxRdRMFbPgqd8gaQzmf3THktRKxS86DnMbnZppUnKblrw2xtpFAbewJv3hp8QjwglolRzUtWiqqgkRYCSrkSGSpnCkF5ciwwMQQGjcyvuUvoCAC2lzkS5Og073w17l5_7w6n1QxxXPvPAMeUh9m3O-XHXdz5Pjrt7cRP4kPzDpDPx-fL8sd4U2_fXt_VqW_D4YihYB0sL7VBWjURbNezJSmsd6XF8ExwaYvLeoLO8pKVm5xUoaywEDI5mYn7pdbFPKfpQH-O-5fhdI9S_uPWEW0-4Y-LpkmDX_h__mT_op1BH</recordid><startdate>20220722</startdate><enddate>20220722</enddate><creator>Cao, Chen</creator><creator>Simon, Tomas</creator><creator>Kim, Jin Kyu</creator><creator>Schwartz, Gabe</creator><creator>Zollhoefer, Michael</creator><creator>Saito, Shun-Suke</creator><creator>Lombardi, Stephen</creator><creator>Wei, Shih-En</creator><creator>Belko, Danielle</creator><creator>Yu, Shoou-I</creator><creator>Sheikh, Yaser</creator><creator>Saragih, Jason</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20220722</creationdate><title>Authentic volumetric avatars from a phone scan</title><author>Cao, Chen ; Simon, Tomas ; Kim, Jin Kyu ; Schwartz, Gabe ; Zollhoefer, Michael ; Saito, Shun-Suke ; Lombardi, Stephen ; Wei, Shih-En ; Belko, Danielle ; Yu, Shoou-I ; Sheikh, Yaser ; Saragih, Jason</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a301t-a6fb386c142d41b2dae3b4bbc36c36edfc173a3ee71cba9396ace505b7b0f1fc3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Animation</topic><topic>Computer graphics</topic><topic>Computing methodologies</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cao, Chen</creatorcontrib><creatorcontrib>Simon, Tomas</creatorcontrib><creatorcontrib>Kim, Jin Kyu</creatorcontrib><creatorcontrib>Schwartz, Gabe</creatorcontrib><creatorcontrib>Zollhoefer, Michael</creatorcontrib><creatorcontrib>Saito, Shun-Suke</creatorcontrib><creatorcontrib>Lombardi, Stephen</creatorcontrib><creatorcontrib>Wei, Shih-En</creatorcontrib><creatorcontrib>Belko, Danielle</creatorcontrib><creatorcontrib>Yu, Shoou-I</creatorcontrib><creatorcontrib>Sheikh, Yaser</creatorcontrib><creatorcontrib>Saragih, Jason</creatorcontrib><collection>CrossRef</collection><jtitle>ACM transactions on graphics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Cao, Chen</au><au>Simon, Tomas</au><au>Kim, Jin Kyu</au><au>Schwartz, Gabe</au><au>Zollhoefer, Michael</au><au>Saito, Shun-Suke</au><au>Lombardi, Stephen</au><au>Wei, Shih-En</au><au>Belko, Danielle</au><au>Yu, Shoou-I</au><au>Sheikh, Yaser</au><au>Saragih, Jason</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Authentic volumetric avatars from a phone scan</atitle><jtitle>ACM transactions on graphics</jtitle><stitle>ACM TOG</stitle><date>2022-07-22</date><risdate>2022</risdate><volume>41</volume><issue>4</issue><spage>1</spage><epage>19</epage><pages>1-19</pages><artnum>163</artnum><issn>0730-0301</issn><eissn>1557-7368</eissn><abstract>Creating photorealistic avatars of existing people currently requires extensive person-specific data capture, which is usually only accessible to the VFX industry and not the general public. Our work aims to address this drawback by relying only on a short mobile phone capture to obtain a drivable 3D head avatar that matches a person's likeness faithfully. In contrast to existing approaches, our architecture avoids the complex task of directly modeling the entire manifold of human appearance, aiming instead to generate an avatar model that can be specialized to novel identities using only small amounts of data. The model dispenses with low-dimensional latent spaces that are commonly employed for hallucinating novel identities, and instead, uses a conditional representation that can extract person-specific information at multiple scales from a high resolution registered neutral phone scan. We achieve high quality results through the use of a novel universal avatar prior that has been trained on high resolution multi-view video captures of facial performances of hundreds of human subjects. By fine-tuning the model using inverse rendering we achieve increased realism and personalize its range of motion. The output of our approach is not only a high-fidelity 3D head avatar that matches the person's facial shape and appearance, but one that can also be driven using a jointly discovered shared global expression space with disentangled controls for gaze direction. Via a series of experiments we demonstrate that our avatars are faithful representations of the subject's likeness. Compared to other state-of-the-art methods for lightweight avatar creation, our approach exhibits superior visual quality and animateability.</abstract><cop>New York, NY, USA</cop><pub>ACM</pub><doi>10.1145/3528223.3530143</doi><tpages>19</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0730-0301
ispartof ACM transactions on graphics, 2022-07, Vol.41 (4), p.1-19, Article 163
issn 0730-0301
1557-7368
language eng
recordid cdi_crossref_primary_10_1145_3528223_3530143
source ACM Digital Library Complete
subjects Animation
Computer graphics
Computing methodologies
title Authentic volumetric avatars from a phone scan
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-01T15%3A11%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Authentic%20volumetric%20avatars%20from%20a%20phone%20scan&rft.jtitle=ACM%20transactions%20on%20graphics&rft.au=Cao,%20Chen&rft.date=2022-07-22&rft.volume=41&rft.issue=4&rft.spage=1&rft.epage=19&rft.pages=1-19&rft.artnum=163&rft.issn=0730-0301&rft.eissn=1557-7368&rft_id=info:doi/10.1145/3528223.3530143&rft_dat=%3Cacm_cross%3E3530143%3C/acm_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true