Gaussian Surfel Splatting for Live Human Performance Capture
High-quality real-time rendering using user-affordable capture rigs is an essential property of human performance capture systems for real-world applications. However, state-of-the-art performance capture methods may not yield satisfactory rendering results under a very sparse (e.g., four) capture s...
Gespeichert in:
Veröffentlicht in: | ACM transactions on graphics 2024-12, Vol.43 (6), p.1-17, Article 263 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 17 |
---|---|
container_issue | 6 |
container_start_page | 1 |
container_title | ACM transactions on graphics |
container_volume | 43 |
creator | Dong, Zheng Xu, Ke Gao, Yaoan Bao, Hujun Xu, Weiwei Lau, Rynson W. H. |
description | High-quality real-time rendering using user-affordable capture rigs is an essential property of human performance capture systems for real-world applications. However, state-of-the-art performance capture methods may not yield satisfactory rendering results under a very sparse (e.g., four) capture setting. Specifically, neural radiance field (NeRF)-based methods and 3D Gaussian Splatting (3DGS)-based methods tend to produce local geometry errors for unseen performers, while occupancy field (PIFu)-based methods often produce unrealistic rendering results. In this paper, we propose a novel generalizable neural approach to reconstruct and render the performers from very sparse RGBD streams in high quality. The core of our method is a novel point-based generalizable human (PGH) representation conditioned on the pixel-aligned RGBD features. The PGH representation learns a surface implicit function for the regression of surface points and a Gaussian implicit function for parameterizing the radiance fields of the regressed surface points with 2D Gaussian surfels, and uses surfel splatting for fast rendering. We learn this hybrid human representation via two novel networks. First, we propose a novel point-regressing network (PRNet) with a depth-guided point cloud initialization (DPI) method to regress an accurate surface point cloud based on the denoised depth information. Second, we propose a novel neural blending-based surfel splatting network (SPNet) to render high-quality geometries and appearances in novel views based on the regressed surface points and high-resolution RGBD features of adjacent views. Our method produces free-view human performance videos of 1K resolution at 12 fps on average. Experiments on two benchmarks show that our method outperforms state-of-the-art human performance capture methods. |
doi_str_mv | 10.1145/3687993 |
format | Article |
fullrecord | <record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3687993</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3687993</sourcerecordid><originalsourceid>FETCH-LOGICAL-a136t-b7d7b3d5306e6ed886a3a4670b7f375eb6953d5bcb91e20923c9865812ac08253</originalsourceid><addsrcrecordid>eNo9j01LxDAQhoMoWFfx7ik3T9VJ03yBFynurlBQWD2XJJ1Kpbtbklbw3xvZ1dPMvM_DwEvINYM7xkpxz6VWxvATkjEhVK7SfUoyUBxy4MDOyUWMnwAgy1Jm5GFl5xh7u6ObOXQ40M042Gnqdx-02wda919I1_M28VcMKUmbR1rZcZoDXpKzzg4Rr45zQd6XT2_VOq9fVs_VY51bxuWUO9Uqx1vBQaLEVmtpuS2lAqc6rgQ6aUTCzjvDsABTcG-0FJoV1oMuBF-Q28NfH_YxBuyaMfRbG74bBs1v6eZYOpk3B9P67b_0B38AFRtQOA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Gaussian Surfel Splatting for Live Human Performance Capture</title><source>ACM Digital Library Complete</source><creator>Dong, Zheng ; Xu, Ke ; Gao, Yaoan ; Bao, Hujun ; Xu, Weiwei ; Lau, Rynson W. H.</creator><creatorcontrib>Dong, Zheng ; Xu, Ke ; Gao, Yaoan ; Bao, Hujun ; Xu, Weiwei ; Lau, Rynson W. H.</creatorcontrib><description>High-quality real-time rendering using user-affordable capture rigs is an essential property of human performance capture systems for real-world applications. However, state-of-the-art performance capture methods may not yield satisfactory rendering results under a very sparse (e.g., four) capture setting. Specifically, neural radiance field (NeRF)-based methods and 3D Gaussian Splatting (3DGS)-based methods tend to produce local geometry errors for unseen performers, while occupancy field (PIFu)-based methods often produce unrealistic rendering results. In this paper, we propose a novel generalizable neural approach to reconstruct and render the performers from very sparse RGBD streams in high quality. The core of our method is a novel point-based generalizable human (PGH) representation conditioned on the pixel-aligned RGBD features. The PGH representation learns a surface implicit function for the regression of surface points and a Gaussian implicit function for parameterizing the radiance fields of the regressed surface points with 2D Gaussian surfels, and uses surfel splatting for fast rendering. We learn this hybrid human representation via two novel networks. First, we propose a novel point-regressing network (PRNet) with a depth-guided point cloud initialization (DPI) method to regress an accurate surface point cloud based on the denoised depth information. Second, we propose a novel neural blending-based surfel splatting network (SPNet) to render high-quality geometries and appearances in novel views based on the regressed surface points and high-resolution RGBD features of adjacent views. Our method produces free-view human performance videos of 1K resolution at 12 fps on average. Experiments on two benchmarks show that our method outperforms state-of-the-art human performance capture methods.</description><identifier>ISSN: 0730-0301</identifier><identifier>EISSN: 1557-7368</identifier><identifier>DOI: 10.1145/3687993</identifier><language>eng</language><publisher>New York, NY, USA: ACM</publisher><subject>Computer graphics ; Computing methodologies ; Image manipulation ; Image-based rendering ; Point-based models ; Shape modeling</subject><ispartof>ACM transactions on graphics, 2024-12, Vol.43 (6), p.1-17, Article 263</ispartof><rights>Copyright is held by the owner/author(s). Publication rights licensed to ACM.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-a136t-b7d7b3d5306e6ed886a3a4670b7f375eb6953d5bcb91e20923c9865812ac08253</cites><orcidid>0000-0002-2662-0334 ; 0009-0003-4791-0263 ; 0000-0001-5855-3810 ; 0000-0003-3756-3539 ; 0000-0002-8957-8129 ; 0009-0004-5191-9348</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://dl.acm.org/doi/pdf/10.1145/3687993$$EPDF$$P50$$Gacm$$H</linktopdf><link.rule.ids>314,780,784,2281,27923,27924,40195,76099</link.rule.ids></links><search><creatorcontrib>Dong, Zheng</creatorcontrib><creatorcontrib>Xu, Ke</creatorcontrib><creatorcontrib>Gao, Yaoan</creatorcontrib><creatorcontrib>Bao, Hujun</creatorcontrib><creatorcontrib>Xu, Weiwei</creatorcontrib><creatorcontrib>Lau, Rynson W. H.</creatorcontrib><title>Gaussian Surfel Splatting for Live Human Performance Capture</title><title>ACM transactions on graphics</title><addtitle>ACM TOG</addtitle><description>High-quality real-time rendering using user-affordable capture rigs is an essential property of human performance capture systems for real-world applications. However, state-of-the-art performance capture methods may not yield satisfactory rendering results under a very sparse (e.g., four) capture setting. Specifically, neural radiance field (NeRF)-based methods and 3D Gaussian Splatting (3DGS)-based methods tend to produce local geometry errors for unseen performers, while occupancy field (PIFu)-based methods often produce unrealistic rendering results. In this paper, we propose a novel generalizable neural approach to reconstruct and render the performers from very sparse RGBD streams in high quality. The core of our method is a novel point-based generalizable human (PGH) representation conditioned on the pixel-aligned RGBD features. The PGH representation learns a surface implicit function for the regression of surface points and a Gaussian implicit function for parameterizing the radiance fields of the regressed surface points with 2D Gaussian surfels, and uses surfel splatting for fast rendering. We learn this hybrid human representation via two novel networks. First, we propose a novel point-regressing network (PRNet) with a depth-guided point cloud initialization (DPI) method to regress an accurate surface point cloud based on the denoised depth information. Second, we propose a novel neural blending-based surfel splatting network (SPNet) to render high-quality geometries and appearances in novel views based on the regressed surface points and high-resolution RGBD features of adjacent views. Our method produces free-view human performance videos of 1K resolution at 12 fps on average. Experiments on two benchmarks show that our method outperforms state-of-the-art human performance capture methods.</description><subject>Computer graphics</subject><subject>Computing methodologies</subject><subject>Image manipulation</subject><subject>Image-based rendering</subject><subject>Point-based models</subject><subject>Shape modeling</subject><issn>0730-0301</issn><issn>1557-7368</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNo9j01LxDAQhoMoWFfx7ik3T9VJ03yBFynurlBQWD2XJJ1Kpbtbklbw3xvZ1dPMvM_DwEvINYM7xkpxz6VWxvATkjEhVK7SfUoyUBxy4MDOyUWMnwAgy1Jm5GFl5xh7u6ObOXQ40M042Gnqdx-02wda919I1_M28VcMKUmbR1rZcZoDXpKzzg4Rr45zQd6XT2_VOq9fVs_VY51bxuWUO9Uqx1vBQaLEVmtpuS2lAqc6rgQ6aUTCzjvDsABTcG-0FJoV1oMuBF-Q28NfH_YxBuyaMfRbG74bBs1v6eZYOpk3B9P67b_0B38AFRtQOA</recordid><startdate>20241219</startdate><enddate>20241219</enddate><creator>Dong, Zheng</creator><creator>Xu, Ke</creator><creator>Gao, Yaoan</creator><creator>Bao, Hujun</creator><creator>Xu, Weiwei</creator><creator>Lau, Rynson W. H.</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-2662-0334</orcidid><orcidid>https://orcid.org/0009-0003-4791-0263</orcidid><orcidid>https://orcid.org/0000-0001-5855-3810</orcidid><orcidid>https://orcid.org/0000-0003-3756-3539</orcidid><orcidid>https://orcid.org/0000-0002-8957-8129</orcidid><orcidid>https://orcid.org/0009-0004-5191-9348</orcidid></search><sort><creationdate>20241219</creationdate><title>Gaussian Surfel Splatting for Live Human Performance Capture</title><author>Dong, Zheng ; Xu, Ke ; Gao, Yaoan ; Bao, Hujun ; Xu, Weiwei ; Lau, Rynson W. H.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a136t-b7d7b3d5306e6ed886a3a4670b7f375eb6953d5bcb91e20923c9865812ac08253</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer graphics</topic><topic>Computing methodologies</topic><topic>Image manipulation</topic><topic>Image-based rendering</topic><topic>Point-based models</topic><topic>Shape modeling</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Dong, Zheng</creatorcontrib><creatorcontrib>Xu, Ke</creatorcontrib><creatorcontrib>Gao, Yaoan</creatorcontrib><creatorcontrib>Bao, Hujun</creatorcontrib><creatorcontrib>Xu, Weiwei</creatorcontrib><creatorcontrib>Lau, Rynson W. H.</creatorcontrib><collection>CrossRef</collection><jtitle>ACM transactions on graphics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Dong, Zheng</au><au>Xu, Ke</au><au>Gao, Yaoan</au><au>Bao, Hujun</au><au>Xu, Weiwei</au><au>Lau, Rynson W. H.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Gaussian Surfel Splatting for Live Human Performance Capture</atitle><jtitle>ACM transactions on graphics</jtitle><stitle>ACM TOG</stitle><date>2024-12-19</date><risdate>2024</risdate><volume>43</volume><issue>6</issue><spage>1</spage><epage>17</epage><pages>1-17</pages><artnum>263</artnum><issn>0730-0301</issn><eissn>1557-7368</eissn><abstract>High-quality real-time rendering using user-affordable capture rigs is an essential property of human performance capture systems for real-world applications. However, state-of-the-art performance capture methods may not yield satisfactory rendering results under a very sparse (e.g., four) capture setting. Specifically, neural radiance field (NeRF)-based methods and 3D Gaussian Splatting (3DGS)-based methods tend to produce local geometry errors for unseen performers, while occupancy field (PIFu)-based methods often produce unrealistic rendering results. In this paper, we propose a novel generalizable neural approach to reconstruct and render the performers from very sparse RGBD streams in high quality. The core of our method is a novel point-based generalizable human (PGH) representation conditioned on the pixel-aligned RGBD features. The PGH representation learns a surface implicit function for the regression of surface points and a Gaussian implicit function for parameterizing the radiance fields of the regressed surface points with 2D Gaussian surfels, and uses surfel splatting for fast rendering. We learn this hybrid human representation via two novel networks. First, we propose a novel point-regressing network (PRNet) with a depth-guided point cloud initialization (DPI) method to regress an accurate surface point cloud based on the denoised depth information. Second, we propose a novel neural blending-based surfel splatting network (SPNet) to render high-quality geometries and appearances in novel views based on the regressed surface points and high-resolution RGBD features of adjacent views. Our method produces free-view human performance videos of 1K resolution at 12 fps on average. Experiments on two benchmarks show that our method outperforms state-of-the-art human performance capture methods.</abstract><cop>New York, NY, USA</cop><pub>ACM</pub><doi>10.1145/3687993</doi><tpages>17</tpages><orcidid>https://orcid.org/0000-0002-2662-0334</orcidid><orcidid>https://orcid.org/0009-0003-4791-0263</orcidid><orcidid>https://orcid.org/0000-0001-5855-3810</orcidid><orcidid>https://orcid.org/0000-0003-3756-3539</orcidid><orcidid>https://orcid.org/0000-0002-8957-8129</orcidid><orcidid>https://orcid.org/0009-0004-5191-9348</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0730-0301 |
ispartof | ACM transactions on graphics, 2024-12, Vol.43 (6), p.1-17, Article 263 |
issn | 0730-0301 1557-7368 |
language | eng |
recordid | cdi_crossref_primary_10_1145_3687993 |
source | ACM Digital Library Complete |
subjects | Computer graphics Computing methodologies Image manipulation Image-based rendering Point-based models Shape modeling |
title | Gaussian Surfel Splatting for Live Human Performance Capture |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T08%3A14%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Gaussian%20Surfel%20Splatting%20for%20Live%20Human%20Performance%20Capture&rft.jtitle=ACM%20transactions%20on%20graphics&rft.au=Dong,%20Zheng&rft.date=2024-12-19&rft.volume=43&rft.issue=6&rft.spage=1&rft.epage=17&rft.pages=1-17&rft.artnum=263&rft.issn=0730-0301&rft.eissn=1557-7368&rft_id=info:doi/10.1145/3687993&rft_dat=%3Cacm_cross%3E3687993%3C/acm_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |