Enhancing Visual Place Recognition With Hybrid Attention Mechanisms in MixVPR
Visual Place Recognition (VPR) is a fundamental task in robotics and computer vision, where the ability to recognize locations from visual inputs is crucial for autonomous navigation systems. Traditional methods, which rely on handcrafted features or standard convolutional neural networks (CNNs), st...
Gespeichert in:
Veröffentlicht in: | IEEE access 2024, Vol.12, p.159847-159859 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Visual Place Recognition (VPR) is a fundamental task in robotics and computer vision, where the ability to recognize locations from visual inputs is crucial for autonomous navigation systems. Traditional methods, which rely on handcrafted features or standard convolutional neural networks (CNNs), struggle with environmental changes that significantly alter a place's appearance. Recent advancements in deep learning have improved VPR by focusing on deep-learned features, enhancing robustness under varying conditions. However, these methods often overlook saliency cues, leading to inefficiencies in dynamic scenes. To address these limitations, we propose an improved MixVPR model that incorporates both self-attention and cross-attention mechanisms through a spatial-wise hybrid attention mechanism. This enhancement integrates spatial saliency cues into the global image embedding, improving accuracy and reliability. We also utilize the DINOv2 visual transformer for robust feature extraction. Extensive experiments on mainstream VPR benchmarks demonstrate that our method achieves superior performance while maintaining computational efficiency. Ablation studies and visualizations further validate the contributions of our attention mechanisms to the model's performance improvement. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2024.3487171 |