Learning robust representation and sequence constraint for retrieval-based long-term visual place recognition

Long-term visual place recognition is a crucial task for mobile robots. However, it poses significant challenges due to factors such as seasonal changes and variations in illumination, making it difficult to acquire reliable visual features. To address this issue, researchers have proposed leveragin...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Engineering applications of artificial intelligence 2024-12, Vol.138, p.109425, Article 109425
Hauptverfasser: Tan, Yanhai, Ji, Peng, Zhang, Yunzhou, Ge, Fawei, Zhu, Shangdong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Long-term visual place recognition is a crucial task for mobile robots. However, it poses significant challenges due to factors such as seasonal changes and variations in illumination, making it difficult to acquire reliable visual features. To address this issue, researchers have proposed leveraging robust features, including semantic and depth information, and integrating them into global descriptors. Unfortunately, existing algorithms primarily concentrate on semantic and depth information, disregarding the relationship between semantic and depth information, as well as the dataset’s continuity. In this paper, we employ Generative Adversarial Networks equipped with two discriminators to extract robust features through domain adaptation, effectively utilizing high-quality semantic and depth maps from the Virtual dataset. These features encompass not only semantic and depth information but also coupled semantic-depth information, which plays a vital role in long-term visual place recognition. Furthermore, considering the dataset’s continuity, this paper introduces a sequence loss to further enhance the accuracy of visual place recognition, which can be applied to other algorithms. The effectiveness of the proposed method is validated under different appearances, seasons, and illumination conditions. The results demonstrate that, compared to using semantic and depth information alone, our method achieves 2.2%, 1%, and 0.6% in the low-precision metrics for park, suburban, and urban, respectively, on the Extended Carnegie Mellon University (CMU) dataset. When incorporating the proposed sequence loss, our method surpasses state-of-the-art domain adaptation algorithms for retrieval-based visual place recognition.
ISSN:0952-1976
DOI:10.1016/j.engappai.2024.109425