Masked Space-Time Hash Encoding for Efficient Dynamic Scene Reconstruction

In this paper, we propose the Masked Space-Time Hash encoding (MSTH), a novel method for efficiently reconstructing dynamic 3D scenes from multi-view or monocular videos. Based on the observation that dynamic scenes often contain substantial static areas that result in redundancy in storage and comp...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Wang, Feng, Chen, Zilong, Wang, Guokang, Song, Yafei, Liu, Huaping
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Wang, Feng Chen, Zilong Wang, Guokang Song, Yafei Liu, Huaping
description	In this paper, we propose the Masked Space-Time Hash encoding (MSTH), a novel method for efficiently reconstructing dynamic 3D scenes from multi-view or monocular videos. Based on the observation that dynamic scenes often contain substantial static areas that result in redundancy in storage and computations, MSTH represents a dynamic scene as a weighted combination of a 3D hash encoding and a 4D hash encoding. The weights for the two components are represented by a learnable mask which is guided by an uncertainty-based objective to reflect the spatial and temporal importance of each 3D position. With this design, our method can reduce the hash collision rate by avoiding redundant queries and modifications on static areas, making it feasible to represent a large number of space-time voxels by hash tables with small size.Besides, without the requirements to fit the large numbers of temporally redundant features independently, our method is easier to optimize and converge rapidly with only twenty minutes of training for a 300-frame dynamic scene.As a result, MSTH obtains consistently better results than previous methods with only 20 minutes of training time and 130 MB of memory storage. Code is available at https://github.com/masked-spacetime-hashing/msth
doi_str_mv	10.48550/arxiv.2310.17527
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2310_17527</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2310_17527</sourcerecordid><originalsourceid>FETCH-LOGICAL-a677-b0488142bf43d116f48867b3cf61ab98462d81c2d3078c2091c906e68ce7edef3</originalsourceid><addsrcrecordid>eNotz01OwzAUBGBvukAtB2CFL5Div9jOEpVAQUVINPvIeX4Gq41TOQHR21NaVqOZxUgfITecLZUtS3bn8k_8Xgp5GrgphbkiL69u3KGn24MDLJrYI1278ZPWCQYf0wcNQ6Z1CBEipok-HJPrI9AtYEL6jjCkccpfMMUhLcgsuP2I1_85J81j3azWxebt6Xl1vymcNqbomLKWK9EFJT3nOpyqNp2EoLnrKqu08JaD8JIZC4JVHCqmUVtAgx6DnJPby-0Z0x5y7F0-tn-o9oySv6tDRo4</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Masked Space-Time Hash Encoding for Efficient Dynamic Scene Reconstruction</title><source>arXiv.org</source><creator>Wang, Feng ; Chen, Zilong ; Wang, Guokang ; Song, Yafei ; Liu, Huaping</creator><creatorcontrib>Wang, Feng ; Chen, Zilong ; Wang, Guokang ; Song, Yafei ; Liu, Huaping</creatorcontrib><description>In this paper, we propose the Masked Space-Time Hash encoding (MSTH), a novel method for efficiently reconstructing dynamic 3D scenes from multi-view or monocular videos. Based on the observation that dynamic scenes often contain substantial static areas that result in redundancy in storage and computations, MSTH represents a dynamic scene as a weighted combination of a 3D hash encoding and a 4D hash encoding. The weights for the two components are represented by a learnable mask which is guided by an uncertainty-based objective to reflect the spatial and temporal importance of each 3D position. With this design, our method can reduce the hash collision rate by avoiding redundant queries and modifications on static areas, making it feasible to represent a large number of space-time voxels by hash tables with small size.Besides, without the requirements to fit the large numbers of temporally redundant features independently, our method is easier to optimize and converge rapidly with only twenty minutes of training for a 300-frame dynamic scene.As a result, MSTH obtains consistently better results than previous methods with only 20 minutes of training time and 130 MB of memory storage. Code is available at https://github.com/masked-spacetime-hashing/msth</description><identifier>DOI: 10.48550/arxiv.2310.17527</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2023-10</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2310.17527$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2310.17527$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Wang, Feng</creatorcontrib><creatorcontrib>Chen, Zilong</creatorcontrib><creatorcontrib>Wang, Guokang</creatorcontrib><creatorcontrib>Song, Yafei</creatorcontrib><creatorcontrib>Liu, Huaping</creatorcontrib><title>Masked Space-Time Hash Encoding for Efficient Dynamic Scene Reconstruction</title><description>In this paper, we propose the Masked Space-Time Hash encoding (MSTH), a novel method for efficiently reconstructing dynamic 3D scenes from multi-view or monocular videos. Based on the observation that dynamic scenes often contain substantial static areas that result in redundancy in storage and computations, MSTH represents a dynamic scene as a weighted combination of a 3D hash encoding and a 4D hash encoding. The weights for the two components are represented by a learnable mask which is guided by an uncertainty-based objective to reflect the spatial and temporal importance of each 3D position. With this design, our method can reduce the hash collision rate by avoiding redundant queries and modifications on static areas, making it feasible to represent a large number of space-time voxels by hash tables with small size.Besides, without the requirements to fit the large numbers of temporally redundant features independently, our method is easier to optimize and converge rapidly with only twenty minutes of training for a 300-frame dynamic scene.As a result, MSTH obtains consistently better results than previous methods with only 20 minutes of training time and 130 MB of memory storage. Code is available at https://github.com/masked-spacetime-hashing/msth</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz01OwzAUBGBvukAtB2CFL5Div9jOEpVAQUVINPvIeX4Gq41TOQHR21NaVqOZxUgfITecLZUtS3bn8k_8Xgp5GrgphbkiL69u3KGn24MDLJrYI1278ZPWCQYf0wcNQ6Z1CBEipok-HJPrI9AtYEL6jjCkccpfMMUhLcgsuP2I1_85J81j3azWxebt6Xl1vymcNqbomLKWK9EFJT3nOpyqNp2EoLnrKqu08JaD8JIZC4JVHCqmUVtAgx6DnJPby-0Z0x5y7F0-tn-o9oySv6tDRo4</recordid><startdate>20231026</startdate><enddate>20231026</enddate><creator>Wang, Feng</creator><creator>Chen, Zilong</creator><creator>Wang, Guokang</creator><creator>Song, Yafei</creator><creator>Liu, Huaping</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20231026</creationdate><title>Masked Space-Time Hash Encoding for Efficient Dynamic Scene Reconstruction</title><author>Wang, Feng ; Chen, Zilong ; Wang, Guokang ; Song, Yafei ; Liu, Huaping</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a677-b0488142bf43d116f48867b3cf61ab98462d81c2d3078c2091c906e68ce7edef3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Wang, Feng</creatorcontrib><creatorcontrib>Chen, Zilong</creatorcontrib><creatorcontrib>Wang, Guokang</creatorcontrib><creatorcontrib>Song, Yafei</creatorcontrib><creatorcontrib>Liu, Huaping</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wang, Feng</au><au>Chen, Zilong</au><au>Wang, Guokang</au><au>Song, Yafei</au><au>Liu, Huaping</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Masked Space-Time Hash Encoding for Efficient Dynamic Scene Reconstruction</atitle><date>2023-10-26</date><risdate>2023</risdate><abstract>In this paper, we propose the Masked Space-Time Hash encoding (MSTH), a novel method for efficiently reconstructing dynamic 3D scenes from multi-view or monocular videos. Based on the observation that dynamic scenes often contain substantial static areas that result in redundancy in storage and computations, MSTH represents a dynamic scene as a weighted combination of a 3D hash encoding and a 4D hash encoding. The weights for the two components are represented by a learnable mask which is guided by an uncertainty-based objective to reflect the spatial and temporal importance of each 3D position. With this design, our method can reduce the hash collision rate by avoiding redundant queries and modifications on static areas, making it feasible to represent a large number of space-time voxels by hash tables with small size.Besides, without the requirements to fit the large numbers of temporally redundant features independently, our method is easier to optimize and converge rapidly with only twenty minutes of training for a 300-frame dynamic scene.As a result, MSTH obtains consistently better results than previous methods with only 20 minutes of training time and 130 MB of memory storage. Code is available at https://github.com/masked-spacetime-hashing/msth</abstract><doi>10.48550/arxiv.2310.17527</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2310.17527
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2310_17527
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition
title	Masked Space-Time Hash Encoding for Efficient Dynamic Scene Reconstruction
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-16T03%3A25%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Masked%20Space-Time%20Hash%20Encoding%20for%20Efficient%20Dynamic%20Scene%20Reconstruction&rft.au=Wang,%20Feng&rft.date=2023-10-26&rft_id=info:doi/10.48550/arxiv.2310.17527&rft_dat=%3Carxiv_GOX%3E2310_17527%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true