Real-Time 3D Visual Perception by Cross-Dimensional Refined Learning

We introduce a novel learning method that can effectively perceive both the geometry structure and semantic labels of a 3D scene in real time. Existing real-time 3D scene reconstruction approaches often rely on volumetric schemes to regress a Truncated Signed Distance Function (TSDF) as the 3D repre...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on circuits and systems for video technology 2024-10, Vol.34 (10), p.10326-10338
Hauptverfasser: Hong, Ziyang, Patrick Yue, C.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We introduce a novel learning method that can effectively perceive both the geometry structure and semantic labels of a 3D scene in real time. Existing real-time 3D scene reconstruction approaches often rely on volumetric schemes to regress a Truncated Signed Distance Function (TSDF) as the 3D representation. However, these volumetric approaches primarily focus on ensuring global coherence in the reconstructed scene, which often results in a lack of local geometric detail. To address this limitation, we propose a solution that leverages the latent geometric knowledge present in 2D image features by explicit depth prediction thereby creating anchored features, which are used to refine the learning of occupancy in the TSDF volume. Furthermore, we discover that this cross-dimensional feature refinement methodology can also be applied to the task of semantic segmentation by utilizing semantic priors. As a result, we propose an end-to-end cross-dimensional refinement neural network (CDRNet) that can extract both the 3D mesh and 3D semantic labeling of a scene in real time. Through experimental evaluation on multiple datasets, we demonstrate that our method achieves state-of-the-art 3D perception capability by boosting over 40% and 18% in 3D semantic segmentation and geometric reconstruction respectively over the prior art. These promising results indicate the significant potential of our approach for various industrial applications. Demo video and code can be found on the project page, https://hafred.github.io/cdrnet/ .
ISSN:1051-8215
1558-2205
DOI:10.1109/TCSVT.2024.3406401