Generalizable Person Re-Identification via Viewpoint Alignment and Fusion
In the current person Re-identification (ReID) methods, most domain generalization works focus on dealing with style differences between domains while largely ignoring unpredictable camera view change, which we identify as another major factor leading to a poor generalization of ReID methods. To tac...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In the current person Re-identification (ReID) methods, most domain
generalization works focus on dealing with style differences between domains
while largely ignoring unpredictable camera view change, which we identify as
another major factor leading to a poor generalization of ReID methods. To
tackle the viewpoint change, this work proposes to use a 3D dense pose
estimation model and a texture mapping module to map the pedestrian images to
canonical view images. Due to the imperfection of the texture mapping module,
the canonical view images may lose the discriminative detail clues from the
original images, and thus directly using them for ReID will inevitably result
in poor performance. To handle this issue, we propose to fuse the original
image and canonical view image via a transformer-based module. The key insight
of this design is that the cross-attention mechanism in the transformer could
be an ideal solution to align the discriminative texture clues from the
original image with the canonical view image, which could compensate for the
low-quality texture information of the canonical view image. Through extensive
experiments, we show that our method can lead to superior performance over the
existing approaches in various evaluation settings. |
---|---|
DOI: | 10.48550/arxiv.2212.02398 |