Gaussian Surfel Splatting for Live Human Performance Capture

High-quality real-time rendering using user-affordable capture rigs is an essential property of human performance capture systems for real-world applications. However, state-of-the-art performance capture methods may not yield satisfactory rendering results under a very sparse (e.g., four) capture s...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	ACM transactions on graphics 2024-12, Vol.43 (6), p.1-17, Article 263
Hauptverfasser:	Dong, Zheng, Xu, Ke, Gao, Yaoan, Bao, Hujun, Xu, Weiwei, Lau, Rynson W. H.
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer graphics Computing methodologies Image manipulation Image-based rendering Point-based models Shape modeling
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	17
container_issue	6
container_start_page	1
container_title	ACM transactions on graphics
container_volume	43
creator	Dong, Zheng Xu, Ke Gao, Yaoan Bao, Hujun Xu, Weiwei Lau, Rynson W. H.
description	High-quality real-time rendering using user-affordable capture rigs is an essential property of human performance capture systems for real-world applications. However, state-of-the-art performance capture methods may not yield satisfactory rendering results under a very sparse (e.g., four) capture setting. Specifically, neural radiance field (NeRF)-based methods and 3D Gaussian Splatting (3DGS)-based methods tend to produce local geometry errors for unseen performers, while occupancy field (PIFu)-based methods often produce unrealistic rendering results. In this paper, we propose a novel generalizable neural approach to reconstruct and render the performers from very sparse RGBD streams in high quality. The core of our method is a novel point-based generalizable human (PGH) representation conditioned on the pixel-aligned RGBD features. The PGH representation learns a surface implicit function for the regression of surface points and a Gaussian implicit function for parameterizing the radiance fields of the regressed surface points with 2D Gaussian surfels, and uses surfel splatting for fast rendering. We learn this hybrid human representation via two novel networks. First, we propose a novel point-regressing network (PRNet) with a depth-guided point cloud initialization (DPI) method to regress an accurate surface point cloud based on the denoised depth information. Second, we propose a novel neural blending-based surfel splatting network (SPNet) to render high-quality geometries and appearances in novel views based on the regressed surface points and high-resolution RGBD features of adjacent views. Our method produces free-view human performance videos of 1K resolution at 12 fps on average. Experiments on two benchmarks show that our method outperforms state-of-the-art human performance capture methods.
doi_str_mv	10.1145/3687993
format	Article
fullrecord	<record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3687993</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3687993</sourcerecordid><originalsourceid>FETCH-LOGICAL-a136t-b7d7b3d5306e6ed886a3a4670b7f375eb6953d5bcb91e20923c9865812ac08253</originalsourceid><addsrcrecordid>eNo9j01LxDAQhoMoWFfx7ik3T9VJ03yBFynurlBQWD2XJJ1Kpbtbklbw3xvZ1dPMvM_DwEvINYM7xkpxz6VWxvATkjEhVK7SfUoyUBxy4MDOyUWMnwAgy1Jm5GFl5xh7u6ObOXQ40M042Gnqdx-02wda919I1_M28VcMKUmbR1rZcZoDXpKzzg4Rr45zQd6XT2_VOq9fVs_VY51bxuWUO9Uqx1vBQaLEVmtpuS2lAqc6rgQ6aUTCzjvDsABTcG-0FJoV1oMuBF-Q28NfH_YxBuyaMfRbG74bBs1v6eZYOpk3B9P67b_0B38AFRtQOA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Gaussian Surfel Splatting for Live Human Performance Capture</title><source>ACM Digital Library Complete</source><creator>Dong, Zheng ; Xu, Ke ; Gao, Yaoan ; Bao, Hujun ; Xu, Weiwei ; Lau, Rynson W. H.</creator><creatorcontrib>Dong, Zheng ; Xu, Ke ; Gao, Yaoan ; Bao, Hujun ; Xu, Weiwei ; Lau, Rynson W. H.</creatorcontrib><description>High-quality real-time rendering using user-affordable capture rigs is an essential property of human performance capture systems for real-world applications. However, state-of-the-art performance capture methods may not yield satisfactory rendering results under a very sparse (e.g., four) capture setting. Specifically, neural radiance field (NeRF)-based methods and 3D Gaussian Splatting (3DGS)-based methods tend to produce local geometry errors for unseen performers, while occupancy field (PIFu)-based methods often produce unrealistic rendering results. In this paper, we propose a novel generalizable neural approach to reconstruct and render the performers from very sparse RGBD streams in high quality. The core of our method is a novel point-based generalizable human (PGH) representation conditioned on the pixel-aligned RGBD features. The PGH representation learns a surface implicit function for the regression of surface points and a Gaussian implicit function for parameterizing the radiance fields of the regressed surface points with 2D Gaussian surfels, and uses surfel splatting for fast rendering. We learn this hybrid human representation via two novel networks. First, we propose a novel point-regressing network (PRNet) with a depth-guided point cloud initialization (DPI) method to regress an accurate surface point cloud based on the denoised depth information. Second, we propose a novel neural blending-based surfel splatting network (SPNet) to render high-quality geometries and appearances in novel views based on the regressed surface points and high-resolution RGBD features of adjacent views. Our method produces free-view human performance videos of 1K resolution at 12 fps on average. Experiments on two benchmarks show that our method outperforms state-of-the-art human performance capture methods.</description><identifier>ISSN: 0730-0301</identifier><identifier>EISSN: 1557-7368</identifier><identifier>DOI: 10.1145/3687993</identifier><language>eng</language><publisher>New York, NY, USA: ACM</publisher><subject>Computer graphics ; Computing methodologies ; Image manipulation ; Image-based rendering ; Point-based models ; Shape modeling</subject><ispartof>ACM transactions on graphics, 2024-12, Vol.43 (6), p.1-17, Article 263</ispartof><rights>Copyright is held by the owner/author(s). Publication rights licensed to ACM.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-a136t-b7d7b3d5306e6ed886a3a4670b7f375eb6953d5bcb91e20923c9865812ac08253</cites><orcidid>0000-0002-2662-0334 ; 0009-0003-4791-0263 ; 0000-0001-5855-3810 ; 0000-0003-3756-3539 ; 0000-0002-8957-8129 ; 0009-0004-5191-9348</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://dl.acm.org/doi/pdf/10.1145/3687993$$EPDF$$P50$$Gacm$$H</linktopdf><link.rule.ids>314,780,784,2281,27923,27924,40195,76099</link.rule.ids></links><search><creatorcontrib>Dong, Zheng</creatorcontrib><creatorcontrib>Xu, Ke</creatorcontrib><creatorcontrib>Gao, Yaoan</creatorcontrib><creatorcontrib>Bao, Hujun</creatorcontrib><creatorcontrib>Xu, Weiwei</creatorcontrib><creatorcontrib>Lau, Rynson W. H.</creatorcontrib><title>Gaussian Surfel Splatting for Live Human Performance Capture</title><title>ACM transactions on graphics</title><addtitle>ACM TOG</addtitle><description>High-quality real-time rendering using user-affordable capture rigs is an essential property of human performance capture systems for real-world applications. However, state-of-the-art performance capture methods may not yield satisfactory rendering results under a very sparse (e.g., four) capture setting. Specifically, neural radiance field (NeRF)-based methods and 3D Gaussian Splatting (3DGS)-based methods tend to produce local geometry errors for unseen performers, while occupancy field (PIFu)-based methods often produce unrealistic rendering results. In this paper, we propose a novel generalizable neural approach to reconstruct and render the performers from very sparse RGBD streams in high quality. The core of our method is a novel point-based generalizable human (PGH) representation conditioned on the pixel-aligned RGBD features. The PGH representation learns a surface implicit function for the regression of surface points and a Gaussian implicit function for parameterizing the radiance fields of the regressed surface points with 2D Gaussian surfels, and uses surfel splatting for fast rendering. We learn this hybrid human representation via two novel networks. First, we propose a novel point-regressing network (PRNet) with a depth-guided point cloud initialization (DPI) method to regress an accurate surface point cloud based on the denoised depth information. Second, we propose a novel neural blending-based surfel splatting network (SPNet) to render high-quality geometries and appearances in novel views based on the regressed surface points and high-resolution RGBD features of adjacent views. Our method produces free-view human performance videos of 1K resolution at 12 fps on average. Experiments on two benchmarks show that our method outperforms state-of-the-art human performance capture methods.</description><subject>Computer graphics</subject><subject>Computing methodologies</subject><subject>Image manipulation</subject><subject>Image-based rendering</subject><subject>Point-based models</subject><subject>Shape modeling</subject><issn>0730-0301</issn><issn>1557-7368</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNo9j01LxDAQhoMoWFfx7ik3T9VJ03yBFynurlBQWD2XJJ1Kpbtbklbw3xvZ1dPMvM_DwEvINYM7xkpxz6VWxvATkjEhVK7SfUoyUBxy4MDOyUWMnwAgy1Jm5GFl5xh7u6ObOXQ40M042Gnqdx-02wda919I1_M28VcMKUmbR1rZcZoDXpKzzg4Rr45zQd6XT2_VOq9fVs_VY51bxuWUO9Uqx1vBQaLEVmtpuS2lAqc6rgQ6aUTCzjvDsABTcG-0FJoV1oMuBF-Q28NfH_YxBuyaMfRbG74bBs1v6eZYOpk3B9P67b_0B38AFRtQOA</recordid><startdate>20241219</startdate><enddate>20241219</enddate><creator>Dong, Zheng</creator><creator>Xu, Ke</creator><creator>Gao, Yaoan</creator><creator>Bao, Hujun</creator><creator>Xu, Weiwei</creator><creator>Lau, Rynson W. H.</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-2662-0334</orcidid><orcidid>https://orcid.org/0009-0003-4791-0263</orcidid><orcidid>https://orcid.org/0000-0001-5855-3810</orcidid><orcidid>https://orcid.org/0000-0003-3756-3539</orcidid><orcidid>https://orcid.org/0000-0002-8957-8129</orcidid><orcidid>https://orcid.org/0009-0004-5191-9348</orcidid></search><sort><creationdate>20241219</creationdate><title>Gaussian Surfel Splatting for Live Human Performance Capture</title><author>Dong, Zheng ; Xu, Ke ; Gao, Yaoan ; Bao, Hujun ; Xu, Weiwei ; Lau, Rynson W. H.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a136t-b7d7b3d5306e6ed886a3a4670b7f375eb6953d5bcb91e20923c9865812ac08253</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer graphics</topic><topic>Computing methodologies</topic><topic>Image manipulation</topic><topic>Image-based rendering</topic><topic>Point-based models</topic><topic>Shape modeling</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Dong, Zheng</creatorcontrib><creatorcontrib>Xu, Ke</creatorcontrib><creatorcontrib>Gao, Yaoan</creatorcontrib><creatorcontrib>Bao, Hujun</creatorcontrib><creatorcontrib>Xu, Weiwei</creatorcontrib><creatorcontrib>Lau, Rynson W. H.</creatorcontrib><collection>CrossRef</collection><jtitle>ACM transactions on graphics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Dong, Zheng</au><au>Xu, Ke</au><au>Gao, Yaoan</au><au>Bao, Hujun</au><au>Xu, Weiwei</au><au>Lau, Rynson W. H.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Gaussian Surfel Splatting for Live Human Performance Capture</atitle><jtitle>ACM transactions on graphics</jtitle><stitle>ACM TOG</stitle><date>2024-12-19</date><risdate>2024</risdate><volume>43</volume><issue>6</issue><spage>1</spage><epage>17</epage><pages>1-17</pages><artnum>263</artnum><issn>0730-0301</issn><eissn>1557-7368</eissn><abstract>High-quality real-time rendering using user-affordable capture rigs is an essential property of human performance capture systems for real-world applications. However, state-of-the-art performance capture methods may not yield satisfactory rendering results under a very sparse (e.g., four) capture setting. Specifically, neural radiance field (NeRF)-based methods and 3D Gaussian Splatting (3DGS)-based methods tend to produce local geometry errors for unseen performers, while occupancy field (PIFu)-based methods often produce unrealistic rendering results. In this paper, we propose a novel generalizable neural approach to reconstruct and render the performers from very sparse RGBD streams in high quality. The core of our method is a novel point-based generalizable human (PGH) representation conditioned on the pixel-aligned RGBD features. The PGH representation learns a surface implicit function for the regression of surface points and a Gaussian implicit function for parameterizing the radiance fields of the regressed surface points with 2D Gaussian surfels, and uses surfel splatting for fast rendering. We learn this hybrid human representation via two novel networks. First, we propose a novel point-regressing network (PRNet) with a depth-guided point cloud initialization (DPI) method to regress an accurate surface point cloud based on the denoised depth information. Second, we propose a novel neural blending-based surfel splatting network (SPNet) to render high-quality geometries and appearances in novel views based on the regressed surface points and high-resolution RGBD features of adjacent views. Our method produces free-view human performance videos of 1K resolution at 12 fps on average. Experiments on two benchmarks show that our method outperforms state-of-the-art human performance capture methods.</abstract><cop>New York, NY, USA</cop><pub>ACM</pub><doi>10.1145/3687993</doi><tpages>17</tpages><orcidid>https://orcid.org/0000-0002-2662-0334</orcidid><orcidid>https://orcid.org/0009-0003-4791-0263</orcidid><orcidid>https://orcid.org/0000-0001-5855-3810</orcidid><orcidid>https://orcid.org/0000-0003-3756-3539</orcidid><orcidid>https://orcid.org/0000-0002-8957-8129</orcidid><orcidid>https://orcid.org/0009-0004-5191-9348</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0730-0301
ispartof	ACM transactions on graphics, 2024-12, Vol.43 (6), p.1-17, Article 263
issn	0730-0301 1557-7368
language	eng
recordid	cdi_crossref_primary_10_1145_3687993
source	ACM Digital Library Complete
subjects	Computer graphics Computing methodologies Image manipulation Image-based rendering Point-based models Shape modeling
title	Gaussian Surfel Splatting for Live Human Performance Capture
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T08%3A14%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Gaussian%20Surfel%20Splatting%20for%20Live%20Human%20Performance%20Capture&rft.jtitle=ACM%20transactions%20on%20graphics&rft.au=Dong,%20Zheng&rft.date=2024-12-19&rft.volume=43&rft.issue=6&rft.spage=1&rft.epage=17&rft.pages=1-17&rft.artnum=263&rft.issn=0730-0301&rft.eissn=1557-7368&rft_id=info:doi/10.1145/3687993&rft_dat=%3Cacm_cross%3E3687993%3C/acm_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true