Neural Head Avatars from Monocular RGB Videos

We present Neural Head Avatars, a novel neural representation that explicitly models the surface geometry and appearance of an animatable human avatar that can be used for teleconferencing in AR/VR or other applications in the movie or games industry that rely on a digital human. Our representation...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Grassal, Philip-William, Prinzler, Malte, Leistner, Titus, Rother, Carsten, Nießner, Matthias, Thies, Justus
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition Computer Science - Graphics
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Grassal, Philip-William Prinzler, Malte Leistner, Titus Rother, Carsten Nießner, Matthias Thies, Justus
description	We present Neural Head Avatars, a novel neural representation that explicitly models the surface geometry and appearance of an animatable human avatar that can be used for teleconferencing in AR/VR or other applications in the movie or games industry that rely on a digital human. Our representation can be learned from a monocular RGB portrait video that features a range of different expressions and views. Specifically, we propose a hybrid representation consisting of a morphable model for the coarse shape and expressions of the face, and two feed-forward networks, predicting vertex offsets of the underlying mesh as well as a view- and expression-dependent texture. We demonstrate that this representation is able to accurately extrapolate to unseen poses and view points, and generates natural expressions while providing sharp texture details. Compared to previous works on head avatars, our method provides a disentangled shape and appearance model of the complete human head (including hair) that is compatible with the standard graphics pipeline. Moreover, it quantitatively and qualitatively outperforms current state of the art in terms of reconstruction quality and novel-view synthesis.
doi_str_mv	10.48550/arxiv.2112.01554
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2112_01554</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2112_01554</sourcerecordid><originalsourceid>FETCH-LOGICAL-a674-ae7c528c00a5765c54e03dc7e895e1b64690d7b1f75c57bf7265ea5e1794f1e93</originalsourceid><addsrcrecordid>eNotzs2KwjAUhuFsXIh6Aa7MDbQmbU5Ou3TEP_AHRNyW0_QECtUOqYpz96POrL7FCx-PEGOtYpMBqCmFZ_2IE62TWGkA0xfRnu-BGrlmquTsQTcKnfShvchde23dvaEgj6svea4rbruh6HlqOh7970CclovTfB1tD6vNfLaNyKKJiNFBkjmlCNCCA8MqrRxylgPr0hqbqwpL7fHVsPSYWGB6JcyN15ynAzH5u_14i-9QXyj8FG938XGnv65CO50</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Neural Head Avatars from Monocular RGB Videos</title><source>arXiv.org</source><creator>Grassal, Philip-William ; Prinzler, Malte ; Leistner, Titus ; Rother, Carsten ; Nießner, Matthias ; Thies, Justus</creator><creatorcontrib>Grassal, Philip-William ; Prinzler, Malte ; Leistner, Titus ; Rother, Carsten ; Nießner, Matthias ; Thies, Justus</creatorcontrib><description>We present Neural Head Avatars, a novel neural representation that explicitly models the surface geometry and appearance of an animatable human avatar that can be used for teleconferencing in AR/VR or other applications in the movie or games industry that rely on a digital human. Our representation can be learned from a monocular RGB portrait video that features a range of different expressions and views. Specifically, we propose a hybrid representation consisting of a morphable model for the coarse shape and expressions of the face, and two feed-forward networks, predicting vertex offsets of the underlying mesh as well as a view- and expression-dependent texture. We demonstrate that this representation is able to accurately extrapolate to unseen poses and view points, and generates natural expressions while providing sharp texture details. Compared to previous works on head avatars, our method provides a disentangled shape and appearance model of the complete human head (including hair) that is compatible with the standard graphics pipeline. Moreover, it quantitatively and qualitatively outperforms current state of the art in terms of reconstruction quality and novel-view synthesis.</description><identifier>DOI: 10.48550/arxiv.2112.01554</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Graphics</subject><creationdate>2021-12</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2112.01554$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2112.01554$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Grassal, Philip-William</creatorcontrib><creatorcontrib>Prinzler, Malte</creatorcontrib><creatorcontrib>Leistner, Titus</creatorcontrib><creatorcontrib>Rother, Carsten</creatorcontrib><creatorcontrib>Nießner, Matthias</creatorcontrib><creatorcontrib>Thies, Justus</creatorcontrib><title>Neural Head Avatars from Monocular RGB Videos</title><description>We present Neural Head Avatars, a novel neural representation that explicitly models the surface geometry and appearance of an animatable human avatar that can be used for teleconferencing in AR/VR or other applications in the movie or games industry that rely on a digital human. Our representation can be learned from a monocular RGB portrait video that features a range of different expressions and views. Specifically, we propose a hybrid representation consisting of a morphable model for the coarse shape and expressions of the face, and two feed-forward networks, predicting vertex offsets of the underlying mesh as well as a view- and expression-dependent texture. We demonstrate that this representation is able to accurately extrapolate to unseen poses and view points, and generates natural expressions while providing sharp texture details. Compared to previous works on head avatars, our method provides a disentangled shape and appearance model of the complete human head (including hair) that is compatible with the standard graphics pipeline. Moreover, it quantitatively and qualitatively outperforms current state of the art in terms of reconstruction quality and novel-view synthesis.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Graphics</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzs2KwjAUhuFsXIh6Aa7MDbQmbU5Ou3TEP_AHRNyW0_QECtUOqYpz96POrL7FCx-PEGOtYpMBqCmFZ_2IE62TWGkA0xfRnu-BGrlmquTsQTcKnfShvchde23dvaEgj6svea4rbruh6HlqOh7970CclovTfB1tD6vNfLaNyKKJiNFBkjmlCNCCA8MqrRxylgPr0hqbqwpL7fHVsPSYWGB6JcyN15ynAzH5u_14i-9QXyj8FG938XGnv65CO50</recordid><startdate>20211202</startdate><enddate>20211202</enddate><creator>Grassal, Philip-William</creator><creator>Prinzler, Malte</creator><creator>Leistner, Titus</creator><creator>Rother, Carsten</creator><creator>Nießner, Matthias</creator><creator>Thies, Justus</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20211202</creationdate><title>Neural Head Avatars from Monocular RGB Videos</title><author>Grassal, Philip-William ; Prinzler, Malte ; Leistner, Titus ; Rother, Carsten ; Nießner, Matthias ; Thies, Justus</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a674-ae7c528c00a5765c54e03dc7e895e1b64690d7b1f75c57bf7265ea5e1794f1e93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Graphics</topic><toplevel>online_resources</toplevel><creatorcontrib>Grassal, Philip-William</creatorcontrib><creatorcontrib>Prinzler, Malte</creatorcontrib><creatorcontrib>Leistner, Titus</creatorcontrib><creatorcontrib>Rother, Carsten</creatorcontrib><creatorcontrib>Nießner, Matthias</creatorcontrib><creatorcontrib>Thies, Justus</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Grassal, Philip-William</au><au>Prinzler, Malte</au><au>Leistner, Titus</au><au>Rother, Carsten</au><au>Nießner, Matthias</au><au>Thies, Justus</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Neural Head Avatars from Monocular RGB Videos</atitle><date>2021-12-02</date><risdate>2021</risdate><abstract>We present Neural Head Avatars, a novel neural representation that explicitly models the surface geometry and appearance of an animatable human avatar that can be used for teleconferencing in AR/VR or other applications in the movie or games industry that rely on a digital human. Our representation can be learned from a monocular RGB portrait video that features a range of different expressions and views. Specifically, we propose a hybrid representation consisting of a morphable model for the coarse shape and expressions of the face, and two feed-forward networks, predicting vertex offsets of the underlying mesh as well as a view- and expression-dependent texture. We demonstrate that this representation is able to accurately extrapolate to unseen poses and view points, and generates natural expressions while providing sharp texture details. Compared to previous works on head avatars, our method provides a disentangled shape and appearance model of the complete human head (including hair) that is compatible with the standard graphics pipeline. Moreover, it quantitatively and qualitatively outperforms current state of the art in terms of reconstruction quality and novel-view synthesis.</abstract><doi>10.48550/arxiv.2112.01554</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2112.01554
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2112_01554
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition Computer Science - Graphics
title	Neural Head Avatars from Monocular RGB Videos
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T18%3A43%3A15IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Neural%20Head%20Avatars%20from%20Monocular%20RGB%20Videos&rft.au=Grassal,%20Philip-William&rft.date=2021-12-02&rft_id=info:doi/10.48550/arxiv.2112.01554&rft_dat=%3Carxiv_GOX%3E2112_01554%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true