SLGaussian: Fast Language Gaussian Splatting in Sparse Views
3D semantic field learning is crucial for applications like autonomous navigation, AR/VR, and robotics, where accurate comprehension of 3D scenes from limited viewpoints is essential. Existing methods struggle under sparse view conditions, relying on inefficient per-scene multi-view optimizations, w...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | 3D semantic field learning is crucial for applications like autonomous
navigation, AR/VR, and robotics, where accurate comprehension of 3D scenes from
limited viewpoints is essential. Existing methods struggle under sparse view
conditions, relying on inefficient per-scene multi-view optimizations, which
are impractical for many real-world tasks. To address this, we propose
SLGaussian, a feed-forward method for constructing 3D semantic fields from
sparse viewpoints, allowing direct inference of 3DGS-based scenes. By ensuring
consistent SAM segmentations through video tracking and using low-dimensional
indexing for high-dimensional CLIP features, SLGaussian efficiently embeds
language information in 3D space, offering a robust solution for accurate 3D
scene understanding under sparse view conditions. In experiments on two-view
sparse 3D object querying and segmentation in the LERF and 3D-OVS datasets,
SLGaussian outperforms existing methods in chosen IoU, Localization Accuracy,
and mIoU. Moreover, our model achieves scene inference in under 30 seconds and
open-vocabulary querying in just 0.011 seconds per query. |
---|---|
DOI: | 10.48550/arxiv.2412.08331 |