Learning discriminative features with a dual-constrained guided network for video-based person re-identification
Video-based person re-identification (ReID) aims at matching pedestrians in a large video gallery across different cameras. However, some interference factors in most real-world scenarios, such as occlusion, pose variations and new appearances, make ReID a challenging task. Most existing methods lea...
Gespeichert in:
Veröffentlicht in: | Multimedia tools and applications 2021-08, Vol.80 (19), p.28673-28696 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Video-based person re-identification (ReID) aims at matching pedestrians in a large video gallery across different cameras. However, some interference factors in most real-world scenarios, such as occlusion, pose variations and new appearances, make ReID a challenging task. Most existing methods learn the features of each frame independently without using the complementary information between different frames, which leads to the fact that the extracted frame features do not have enough discriminability to solve the above problems. In this paper, we propose a novel dual-constrained guided network (DCGN) to capture discriminative features by modeling the relations across frames with two steps. First, to learn the frame-level discriminative features, we design a frame-constrained module (FCM) that learns the channel attention weights by means of combining the intra-frame information and inter-frame information. Next, we propose a sequence-constrained module (SCM) to determine the importance of each frame in a video. This module models the relations between the frame-level features and sequence-level features, alleviating the frame redundancy from a global perspective. We conduct comparison experiments on four representative datasets, i.e., MARS, DukeMTMC-VideoReID, iLIDS-VID and PRID2011. In particular, the Rank-1 reaches 89.65%, 95.35%, 78.51% and 90.82% on four datasets, which outperforms the second-best method by 2.35%, 1.35%, 3.41% and 2.72%, respectively. |
---|---|
ISSN: | 1380-7501 1573-7721 |
DOI: | 10.1007/s11042-021-11072-y |