Spatial pyramid attention and affinity inference embedding for unsupervised person re-identification

Unsupervised person re-identification (Re-ID) aims to learn discriminative features for retrieving person utilizing unlabeled data. Most existing unsupervised person Re-ID methods adopt the generic backbone to extract features for clustering to generate pseudo labels and utilize the pseudo labels to...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computers & electrical engineering 2025-04, Vol.123, p.110126, Article 110126
Hauptverfasser: Duan, Qianyue, Tao, Huanjie
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Unsupervised person re-identification (Re-ID) aims to learn discriminative features for retrieving person utilizing unlabeled data. Most existing unsupervised person Re-ID methods adopt the generic backbone to extract features for clustering to generate pseudo labels and utilize the pseudo labels to train the model. However, due to the lack of accurate category supervision, the generic backbone inevitably extracts interfering features, which degrade the quality of pseudo-labels. Besides, many methods only utilize the similarity between query and gallery images for matching person and ignore the use of affinity information between gallery images. To solve the above issues, we propose a spatial pyramid attention and affinity inference embedding network for unsupervised person Re-ID. We explore the benefit of attention mechanisms in unsupervised person Re-ID, where research is currently limited. We adopt the spatial pyramid attention (SPA) to aggregate structural information at different scales and ensures enough utilization of structural information during attention learning. With the help of SPA, the model reduces the extraction of interfering features, ensuring that it can learn more discriminative for clustering to improve pseudo-label quality. In addition, the affinity inference module (AIM) is utilized to optimize the distance between the query images and the gallery images by additionally using affinity information between gallery images. Extensive experiments on three datasets demonstrate that our method achieves competitive performance. Especially, our method achieves Rank-1 accuracy of 77.1 % on the MSMT17 dataset, outperforming the recent unsupervised work DCMIP by 7+%. Our code will be released at: https://github.com/wanderer1230/SPAENet.
ISSN:0045-7906
DOI:10.1016/j.compeleceng.2025.110126