Learning Object Localization and 6D Pose Estimation from Simulation and Weakly Labeled Real Images
This work proposes a process for efficiently training a point-wise object detector that enables localizing objects and computing their 6D poses in cluttered and occluded scenes. Accurate pose estimation is typically a requirement for robust robotic grasping and manipulation of objects placed in clut...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This work proposes a process for efficiently training a point-wise object
detector that enables localizing objects and computing their 6D poses in
cluttered and occluded scenes. Accurate pose estimation is typically a
requirement for robust robotic grasping and manipulation of objects placed in
cluttered, tight environments, such as a shelf with multiple objects. To
minimize the human labor required for annotation, the proposed object detector
is first trained in simulation by using automatically annotated synthetic
images. We then show that the performance of the detector can be substantially
improved by using a small set of weakly annotated real images, where a human
provides only a list of objects present in each image without indicating the
location of the objects. To close the gap between real and synthetic images, we
adopt a domain adaptation approach through adversarial training. The detector
resulting from this training process can be used to localize objects by using
its per-object activation maps. In this work, we use the activation maps to
guide the search of 6D poses of objects. Our proposed approach is evaluated on
several publicly available datasets for pose estimation. We also evaluated our
model on classification and localization in unsupervised and semi-supervised
settings. The results clearly indicate that this approach could provide an
efficient way toward fully automating the training process of computer vision
models used in robotics. |
---|---|
DOI: | 10.48550/arxiv.1806.06888 |