Real-Time 3D Visual Perception by Cross-Dimensional Refined Learning

We introduce a novel learning method that can effectively perceive both the geometry structure and semantic labels of a 3D scene in real time. Existing real-time 3D scene reconstruction approaches often rely on volumetric schemes to regress a Truncated Signed Distance Function (TSDF) as the 3D repre...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on circuits and systems for video technology 2024-10, Vol.34 (10), p.10326-10338
Hauptverfasser:	Hong, Ziyang, Patrick Yue, C.
Format:	Artikel
Sprache:	eng
Schlagworte:	3D perception 3D reconstruction 3D semantic segmentation deep learning Feature extraction Image reconstruction Image segmentation Industrial applications Labels Learning Monocular vision Neural networks Real time Real-time systems Semantic segmentation Semantics Sensors Space perception Three-dimensional displays Visual perception
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We introduce a novel learning method that can effectively perceive both the geometry structure and semantic labels of a 3D scene in real time. Existing real-time 3D scene reconstruction approaches often rely on volumetric schemes to regress a Truncated Signed Distance Function (TSDF) as the 3D representation. However, these volumetric approaches primarily focus on ensuring global coherence in the reconstructed scene, which often results in a lack of local geometric detail. To address this limitation, we propose a solution that leverages the latent geometric knowledge present in 2D image features by explicit depth prediction thereby creating anchored features, which are used to refine the learning of occupancy in the TSDF volume. Furthermore, we discover that this cross-dimensional feature refinement methodology can also be applied to the task of semantic segmentation by utilizing semantic priors. As a result, we propose an end-to-end cross-dimensional refinement neural network (CDRNet) that can extract both the 3D mesh and 3D semantic labeling of a scene in real time. Through experimental evaluation on multiple datasets, we demonstrate that our method achieves state-of-the-art 3D perception capability by boosting over 40% and 18% in 3D semantic segmentation and geometric reconstruction respectively over the prior art. These promising results indicate the significant potential of our approach for various industrial applications. Demo video and code can be found on the project page, https://hafred.github.io/cdrnet/ .
ISSN:	1051-8215 1558-2205
DOI:	10.1109/TCSVT.2024.3406401