Self-Supervised Geometric Correspondence for Category-Level 6D Object Pose Estimation in the Wild
While 6D object pose estimation has wide applications across computer vision and robotics, it remains far from being solved due to the lack of annotations. The problem becomes even more challenging when moving to category-level 6D pose, which requires generalization to unseen instances. Current appr...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | While 6D object pose estimation has wide applications across computer vision
and robotics, it remains far from being solved due to the lack of annotations.
The problem becomes even more challenging when moving to category-level 6D
pose, which requires generalization to unseen instances. Current approaches are
restricted by leveraging annotations from simulation or collected from humans.
In this paper, we overcome this barrier by introducing a self-supervised
learning approach trained directly on large-scale real-world object videos for
category-level 6D pose estimation in the wild. Our framework reconstructs the
canonical 3D shape of an object category and learns dense correspondences
between input images and the canonical shape via surface embedding. For
training, we propose novel geometrical cycle-consistency losses which construct
cycles across 2D-3D spaces, across different instances and different time
steps. The learned correspondence can be applied for 6D pose estimation and
other downstream tasks such as keypoint transfer. Surprisingly, our method,
without any human annotations or simulators, can achieve on-par or even better
performance than previous supervised or semi-supervised methods on in-the-wild
images. Our project page is: https://kywind.github.io/self-pose . |
---|---|
DOI: | 10.48550/arxiv.2210.07199 |