Saliency and Granularity: Discovering Temporal Coherence for Video-Based Person Re-Identification
Video-based person re-identification (ReID) matches the same people across the video sequences with rich spatial and temporal information in complex scenes. It is highly challenging to capture discriminative information when occlusions and pose variations exist between frames. A key solution to this...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on circuits and systems for video technology 2022-09, Vol.32 (9), p.6100-6112 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Video-based person re-identification (ReID) matches the same people across the video sequences with rich spatial and temporal information in complex scenes. It is highly challenging to capture discriminative information when occlusions and pose variations exist between frames. A key solution to this problem rests on extracting the temporal invariant features of video sequences. In this paper, we propose a novel method for discovering temporal coherence by designing a region-level saliency and granularity mining network (SGMN). Firstly, to address the varying noisy frame problem, we design a temporal spatial-relation module (TSRM) to locate frame-level salient regions, adaptively modeling the temporal relations on spatial dimension through a probe-buffer mechanism. It avoids the information redundancy between frames and captures the informative cues of each frame. Secondly, a temporal channel-relation module (TCRM) is proposed to further mine the small granularity information of each frame, which is complementary to TSRM by concentrating on discriminative small-scale regions. TCRM exploits a one-and-rest difference relation on channel dimension to enhance the granularity features, leading to stronger robustness against misalignments. Finally, we evaluate our SGMN with four representative video-based datasets, including iLIDS-VID, MARS, DukeMTMC-VideoReID, and LS-VID, and the results indicate the effectiveness of the proposed method. |
---|---|
ISSN: | 1051-8215 1558-2205 |
DOI: | 10.1109/TCSVT.2022.3157130 |