Multi-View Label Prediction for Unsupervised Learning Person Re-Identification

Person re-identification (ReID) aims to match pedestrian images across disjoint cameras. Existing supervised ReID methods utilize deep networks and train them with identity-labeled images, which suffer from limited annotations. Recently, clustering-based unsupervised ReID attracts more and more atte...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE signal processing letters 2021, Vol.28, p.1390-1394
Hauptverfasser: Yin, Qingze, Wang, Guan'an, Ding, Guodong, Gong, Shaogang, Tang, Zhenmin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Person re-identification (ReID) aims to match pedestrian images across disjoint cameras. Existing supervised ReID methods utilize deep networks and train them with identity-labeled images, which suffer from limited annotations. Recently, clustering-based unsupervised ReID attracts more and more attention. It first clusters unlabeled images and assigns cluster index to the pseudo-identity-labels, then trains a ReID model with the pseudo-identity-labels. However, considering the slight inter-class variations and significant intra-class variations, pseudo-identity-labels learned from clustering algorithms are usually noisy and coarse. To alleviate the problems above, besides clustering pseudo-identity-labels, we propose to learn pseudo-patch-labels, which brings two advantages: (1) Patch naturally alleviates the effect of backgrounds, occlusions, and carryings since they usually occupy small parts in images, thus overcome noisy labels. (2) It is plausible that patches from different pedestrians belong to the same pseudo-identity-label. For example, pedestrians have a high probability of wearing either the same shoes or pants but a low possibility of wearing both. The experiments demonstrate our proposed method achieves the best performance by a large margin on both image- and video-based datasets.
ISSN:1070-9908
1558-2361
DOI:10.1109/LSP.2021.3090258