Intra-Inter Domain Similarity for Unsupervised Person Re-Identification

Most of unsupervised person Re-Identification (ReID) works produce pseudo-labels by measuring the feature similarity without considering the domain discrepancy among cameras, leading to degraded accuracy in pseudo-label computation across cameras. This paper targets to address this challenge by deco...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence 2024-03, Vol.46 (3), p.1711-1726
Hauptverfasser: Xuan, Shiyu, Zhang, Shiliang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Most of unsupervised person Re-Identification (ReID) works produce pseudo-labels by measuring the feature similarity without considering the domain discrepancy among cameras, leading to degraded accuracy in pseudo-label computation across cameras. This paper targets to address this challenge by decomposing the similarity computation into two stages, i.e., the intra-domain and inter-domain computations, respectively. The intra-domain similarity directly leverages CNN features learned within each camera, hence generates pseudo-labels on different cameras to train the ReID model in a multi-branch network. The inter-domain similarity considers the classification scores of each sample on different cameras as a new feature vector. This new feature effectively alleviates the domain discrepancy among cameras and generates more reliable pseudo-labels. We further propose the Instance and Camera Style Normalization (ICSN) to enhance the robustness to domain discrepancy. ICSN alleviates the intra-camera variations by adaptively learning a combination of instance and batch normalization. ICSN also boosts the robustness to inter-camera variations through TNorm which converts the original style of features into target styles. The proposed method achieves competitive performance on multiple datasets under fully unsupervised, intra-camera supervised and domain generalization settings, e.g., it achieves rank-1 accuracy of 64.4% on the MSMT17 dataset, outperforming the recent unsupervised methods by 20+%.
ISSN:0162-8828
1939-3539
2160-9292
DOI:10.1109/TPAMI.2022.3163451