Improving Self-Supervised Learning of Transparent Category Poses With Language Guidance and Implicit Physical Constraints

Accurate object pose estimation is crucial for robotic applications and recent trends in category-level pose estimation show great potential for applications encountering a large variety of similar objects, often encountered in home environments. While common in such environments, photometrically ch...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE robotics and automation letters 2024-09, Vol.9 (9), p.8114-8121
Hauptverfasser:	Wang, Pengyuan, Garattoni, Lorenzo, Meier, Sven, Navab, Nassir, Busam, Benjamin
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Constraints Encoding Object detection perception for grasping and manipulation Photometry Pipelines Pose estimation RGB-D perception Robotics segmentation and categorization Self-supervised learning Shape Three-dimensional displays Training
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Accurate object pose estimation is crucial for robotic applications and recent trends in category-level pose estimation show great potential for applications encountering a large variety of similar objects, often encountered in home environments. While common in such environments, photometrically challenging objects with transparency such as glasses are poorly handled by current methods. Especially using self-supervision to bridge the sim2real domain gap is difficult for transparent objects due to strong background changes and depth artifacts. To address this, we propose a novel pipeline which takes language guidance and implicit physical constraints for 2D and 3D self-supervisions. In specific, we utilize language guidance to obtain accurate 2D object segmentation which is robust to background changes. Further 3D self-supervisions are achieved by contact constraint and normal constraint from polarization inputs with a differentiable renderer. Instead of explicitly leveraging the depth measurements, we reason about implicit physical constraints for self-supervisions. Extensive experiments superior performance of our self-supervision approach over baselines on both the self-collected dataset and public benchmarks, addressing photometric challenges.
ISSN:	2377-3766 2377-3766
DOI:	10.1109/LRA.2024.3440732