One-Shot Neural Fields for 3D Object Understanding
We present a unified and compact scene representation for robotics, where each object in the scene is depicted by a latent code capturing geometry and appearance. This representation can be decoded for various tasks such as novel view rendering, 3D reconstruction (e.g. recovering depth, point clouds...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We present a unified and compact scene representation for robotics, where
each object in the scene is depicted by a latent code capturing geometry and
appearance. This representation can be decoded for various tasks such as novel
view rendering, 3D reconstruction (e.g. recovering depth, point clouds, or
voxel maps), collision checking, and stable grasp prediction. We build our
representation from a single RGB input image at test time by leveraging recent
advances in Neural Radiance Fields (NeRF) that learn category-level priors on
large multiview datasets, then fine-tune on novel objects from one or few
views. We expand the NeRF model for additional grasp outputs and explore ways
to leverage this representation for robotics. At test-time, we build the
representation from a single RGB input image observing the scene from only one
viewpoint. We find that the recovered representation allows rendering from
novel views, including of occluded object parts, and also for predicting
successful stable grasps. Grasp poses can be directly decoded from our latent
representation with an implicit grasp decoder. We experimented in both
simulation and real world and demonstrated the capability for robust robotic
grasping using such compact representation. Website:
https://nerfgrasp.github.io |
---|---|
DOI: | 10.48550/arxiv.2210.12126 |