Real-world Instance-specific Image Goal Navigation: Bridging Domain Gaps via Contrastive Learning
Improving instance-specific image goal navigation (InstanceImageNav), which locates the identical object in a real-world environment from a query image, is essential for robotic systems to assist users in finding desired objects. The challenge lies in the domain gap between low-quality images observ...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Improving instance-specific image goal navigation (InstanceImageNav), which
locates the identical object in a real-world environment from a query image, is
essential for robotic systems to assist users in finding desired objects. The
challenge lies in the domain gap between low-quality images observed by the
moving robot, characterized by motion blur and low-resolution, and high-quality
query images provided by the user. Such domain gaps could significantly reduce
the task success rate but have not been the focus of previous work. To address
this, we propose a novel method called Few-shot Cross-quality Instance-aware
Adaptation (CrossIA), which employs contrastive learning with an instance
classifier to align features between massive low- and few high-quality images.
This approach effectively reduces the domain gap by bringing the latent
representations of cross-quality images closer on an instance basis.
Additionally, the system integrates an object image collection with a
pre-trained deblurring model to enhance the observed image quality. Our method
fine-tunes the SimSiam model, pre-trained on ImageNet, using CrossIA. We
evaluated our method's effectiveness through an InstanceImageNav task with 20
different types of instances, where the robot identifies the same instance in a
real-world environment as a high-quality query image. Our experiments showed
that our method improves the task success rate by up to three times compared to
the baseline, a conventional approach based on SuperGlue. These findings
highlight the potential of leveraging contrastive learning and image
enhancement techniques to bridge the domain gap and improve object localization
in robotic applications. The project website is
https://emergentsystemlabstudent.github.io/DomainBridgingNav/. |
---|---|
DOI: | 10.48550/arxiv.2404.09645 |