Multi-granularity enhanced feature learning for visible-infrared person re-identification

Visible-infrared person re-identification aims to achieve mutual retrieval of pedestrian images captured by nonoverlapping RGB and IR cameras. Due to factors such as occlusion, changes in perspective, and modal differences, it is difficult for the model to extract the modal-invariant features of the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Journal of supercomputing 2025, Vol.81 (1)
Hauptverfasser: Liu, Huilin, Wu, Yuhao, Tang, Zihan, Li, Xiaolong, Su, Shuzhi, Liang, Xingzhu, Zhang, Pengfei
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Visible-infrared person re-identification aims to achieve mutual retrieval of pedestrian images captured by nonoverlapping RGB and IR cameras. Due to factors such as occlusion, changes in perspective, and modal differences, it is difficult for the model to extract the modal-invariant features of the same pedestrian across different modalities. Existing studies mainly map images of two modalities to the same feature embedding space, learning coarse-grained or fine-grained features that share modality. However, these methods neglect the complementarity of the multi-granularity feature information. In contrast to these methods, we propose an end-to-end Multi-granularity Enhanced Feature Learning Network (MEFL-Net). In the feature embedding module, we design a three-branch structure for learning modality-shared features at different granularities. Within each branch, we employ horizontal blocking technology to divide pedestrian features into multiple levels and extract modality-shared features at various scales. Moreover, to enhance the significance of multi-granularity features, we embed CBAM in the global branch to suppress background interference and enhance the attention on pedestrian bodies. Additionally, since features at different scales process different semantics, we fuse multiple fine-grained features to ensure semantic and feature complementarity. Extensive experiments demonstrate that our method outperforms state-of-the-art methods. In the SYSU-MM01 and RegDB datasets, we achieved an accuracy of 70.47%/73.23% and 91.46%/86.33% for rank1/mAP.
ISSN:0920-8542
1573-0484
DOI:10.1007/s11227-024-06731-4