Visual Place Recognition via a Multitask Learning Method With Attentive Feature Aggregation
Visual place recognition has gained popularity in recent years. Mainstream convolutional neural network-based methods formulate it as a ranking task and optimize it in the paradigm of deep metric learning, however, the ranking-motivated losses concern only the ranking relationship for each query ima...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on cognitive and developmental systems 2023-09, Vol.15 (3), p.1263-1278 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Visual place recognition has gained popularity in recent years. Mainstream convolutional neural network-based methods formulate it as a ranking task and optimize it in the paradigm of deep metric learning, however, the ranking-motivated losses concern only the ranking relationship for each query image and the compactness of intraplace feature distribution is seldom considered. It is still challenging due to varying viewpoints, illuminations, and even dynamic objects. In this article, a novel multitask learning framework is proposed, which combines the existing triplet ranking task and our designed binary classification task to jointly optimize the network for better generalization capability. Specifically, a binary classification network with the corresponding binary cross-entropy loss is designed in the classification task. In this way, the intraplace feature compactness and interplace feature separability are reinforced. At the testing stage, this classification network is discarded without increasing the computation cost. Furthermore, an attention module is presented to promote the network to concentrate on the salient regions by assigning different importance to each spatial position. Our method achieves the top-10 recalls of 97.27%, 94.6%, and 96.93% on Pitts250k-test, Tokyo 24/7, and TokyoTM-val data sets, respectively. Extensive experiments prove that the proposed network can learn discriminative global features with better robustness to viewpoints and environmental variations. |
---|---|
ISSN: | 2379-8920 2379-8939 |
DOI: | 10.1109/TCDS.2022.3206500 |