Multiscale Reference-Aided Attentive Feature Aggregation for Person Re-Identification
In person re-identification (Re-ID), increasing the diversity of pedestrian features can improve recognition accuracy. In standard convolutional neural networks (CNNs), the receptive fields of neurons in each layer are designed to have the same size. Therefore, in complex pedestrian re-identificatio...
Gespeichert in:
Veröffentlicht in: | IEEE access 2021, Vol.9, p.141667-141677 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In person re-identification (Re-ID), increasing the diversity of pedestrian features can improve recognition accuracy. In standard convolutional neural networks (CNNs), the receptive fields of neurons in each layer are designed to have the same size. Therefore, in complex pedestrian re-identification tasks, the standard CNNs extract local features but are unable to obtain satisfactory results for global features extracted from the images. Local feature learning methods are helpful for obtaining more abundant features, which focus on the most significant local features and ignore the correlations between features of various parts of the human body. To solve the above problems, a new multiscale reference-aided attentive feature aggregation (MS-RAFA) mechanism is proposed, consisting of three main modules. First, to extract the most significant local features and strengthen the correlations between the features of various parts of the human body, an autoselect module (ASM) is designed, an attentional mechanism that can stack the structural information and spatial relations to form new features. Then, to realize multiscale feature fusion of the multiple output branches of the backbone network and increase feature diversity, we propose a multilayer feature fusion module (MFFM), which enables the model to mine the features hidden by salient features and to learn features better. Finally, to supervise the MFFM and make the network obtain better recognition features, we propose a multiple supervision mechanism. Finally, experimental results demonstrate that our proposed method outperforms the state-of-the-art methods on three large-scale datasets. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2021.3119576 |